DynamoDB Global Tables: Advanced Patterns for Active-Active Architectures

October 13, 2025

20 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Peril and Promise of Active-Active Architectures

For senior engineers tasked with building systems that demand five-nines availability and low-latency user experiences across geographic regions, the traditional active-passive failover model is often insufficient. The move to an active-active architecture, where multiple regions serve live traffic simultaneously, is a significant architectural leap. AWS DynamoDB Global Tables are a cornerstone technology for enabling this pattern, promising seamless, fully managed, multi-master replication.

However, this promise comes with a critical caveat: the default conflict resolution strategy, Last Write Wins (LWW), is a deceptively simple mechanism that can silently corrupt data in complex, high-throughput applications. Relying on LWW is often a path to production incidents caused by lost updates and data divergence. True active-active resilience requires shifting the responsibility of concurrency control and data integrity from the database to the application layer.

This article dissects the advanced, production-tested patterns required to build robust systems on top of DynamoDB Global Tables. We will not cover the basics of setting up a Global Table. Instead, we will focus on the hard problems: managing write conflicts, ensuring idempotency across distributed writes, and choosing the right data consistency model for your specific use case.

Understanding the Replication Engine: Beyond the Surface

Before diving into patterns, it's crucial to understand how Global Tables work under the hood. When you write to a table in one region (e.g., us-east-1), the following occurs:

Local Commit: The write is committed to the local region's table.

Stream Event: The write operation generates an event in that table's DynamoDB Stream.

Replication Service: A managed replication service consumes this stream event.

Cross-Region Apply: The service attempts to apply the same write operation to the replica tables in other regions (e.g., eu-west-1).

Each item in a Global Table has hidden, system-managed attributes: a timestamp of the last update and a region identifier for that update. When a conflict occurs—meaning two writes update the same item in different regions at nearly the same time—DynamoDB uses the timestamp to determine the winner. The write with the later timestamp is the one that persists. This is LWW.

The fundamental problem: LWW is non-deterministic from the application's perspective. Network latency fluctuations can alter which write is considered "last." More importantly, LWW is a replacement strategy. If two clients attempt to increment a counter, one increment will be lost. If they attempt to add an item to a list, one list will overwrite the other. This is unacceptable for most non-trivial applications.

Pattern 1: Optimistic Concurrency Control with Conditional Writes

This is the most common and powerful pattern for preventing lost updates. Instead of blindly overwriting data, we use DynamoDB's ConditionExpression parameter to turn a PutItem or UpdateItem operation into a transactional check. The core idea is to version our data items.

Each item in the table must have a version attribute, such as _version. The application logic follows a read-modify-write cycle:

Read: Fetch the item from DynamoDB, including its current _version.

Modify: Apply business logic to the in-memory representation of the item.

Write (Conditional): Increment the _version and issue an UpdateItem call with a ConditionExpression that checks if the _version in the database still matches the one we originally read.

If the versions match, the update succeeds. If another process (in any region) updated the item in the meantime, the condition will fail, and DynamoDB will throw a ConditionalCheckFailedException. Your application must then catch this exception and decide how to proceed: re-fetch the item, re-apply the logic, and retry the write.

Production Example: Managing User Profile Updates

Consider a user profile service active in us-east-1 and eu-west-1. A user might update their email from a session in the US while a support agent updates their phone number from a session in Europe.

Data Model:

PK: USER#

SK: PROFILE

email: String

phoneNumber: String

_version: Number

Here's a Python implementation using boto3:

python

import boto3
from botocore.exceptions import ClientError
import time

# Assume table is a boto3.resource('dynamodb').Table('GlobalUsers')
def update_user_profile(user_id, updates, max_retries=3):
    """
    Updates a user profile using optimistic locking.

    :param user_id: The ID of the user.
    :param updates: A dictionary of attributes to update, e.g., {'email': '[email protected]'}
    :param max_retries: The maximum number of times to retry on a conflict.
    """
    retries = 0
    while retries < max_retries:
        try:
            # 1. READ the current item state
            key = {'PK': f'USER#{user_id}', 'SK': 'PROFILE'}
            response = table.get_item(Key=key)
            item = response.get('Item')

            if not item:
                print(f"User {user_id} not found.")
                return False

            current_version = item.get('_version', 0)
            
            # 2. PREPARE the conditional update
            update_expression_parts = []
            expression_attribute_values = {}
            expression_attribute_names = {}

            for i, (attr, value) in enumerate(updates.items()):
                update_expression_parts.append(f"#key{i} = :val{i}")
                expression_attribute_names[f"#key{i}"] = attr
                expression_attribute_values[f":val{i}"] = value
            
            # Atomically increment the version
            update_expression_parts.append("#v = :new_v")
            expression_attribute_names["#v"] = "_version"
            expression_attribute_values[":new_v"] = current_version + 1
            expression_attribute_values[":curr_v"] = current_version

            update_expression = "SET " + ", ".join(update_expression_parts)

            # 3. WRITE with condition check
            print(f"Attempting update for user {user_id} with version {current_version}")
            table.update_item(
                Key=key,
                UpdateExpression=update_expression,
                ConditionExpression="#v = :curr_v",
                ExpressionAttributeNames=expression_attribute_names,
                ExpressionAttributeValues=expression_attribute_values
            )
            print(f"Successfully updated user {user_id} to version {current_version + 1}")
            return True

        except ClientError as e:
            if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
                print(f"Conflict detected for user {user_id}. Version mismatch. Retrying...")
                retries += 1
                time.sleep(0.05 * (2 ** retries)) # Exponential backoff
            else:
                # Handle other AWS errors (throttling, etc.)
                raise
    
    print(f"Failed to update user {user_id} after {max_retries} retries.")
    return False

# --- Usage Example ---
# table = boto3.resource('dynamodb', region_name='us-east-1').Table('GlobalUsers')
# # Initial item setup:
# # table.put_item(Item={'PK': 'USER#123', 'SK': 'PROFILE', 'email': '[email protected]', '_version': 1})

# # Simulate a successful update
# update_user_profile('123', {'phoneNumber': '+15551234567'})

# # To simulate a conflict, you would need to run another update from a different process/region
# # between the GET and the conditional UPDATE of this function.

Edge Cases and Considerations:

* Retry Storms: If contention on a single item is high, your application could enter a retry storm, increasing latency and cost. Implement exponential backoff with jitter and a maximum retry limit.

Complex Merges: This pattern doesn't define how* to merge conflicting changes. In the above example, if two processes try to update the email, the second one to try will fail. The application logic on retry might decide to just overwrite with its value, or it might need more complex logic to present the conflict to a user.

* Initialization: Every item must be initialized with a _version number (e.g., 1). Your PutItem calls for new items should also use a ConditionExpression with attribute_not_exists(PK) to avoid overwriting an existing item in a race condition.

Pattern 2: Application-Side Merging & CRDT-like Behavior

Optimistic locking is great for preventing lost updates on scalar values, but it falls short for commutative operations, like adding items to a set or incrementing a counter. LWW would simply replace the entire set or number. Here, we can draw inspiration from Conflict-free Replicated Data Types (CRDTs) and implement the merge logic in our application.

Production Example: A Multi-Region Shopping Cart

Imagine a shopping cart where a user can add items from their desktop (US region) and their mobile phone (EU region) simultaneously. We want both items to appear in the cart.

Anti-Pattern Data Model (Leads to lost updates):

PK: USER#

SK: CART

items: List of maps, e.g., [{'sku': 'ABC', 'qty': 1}]

If the user adds SKU 'ABC' in the US and SKU 'XYZ' in the EU, LWW will result in a cart with only one of the items.

CRDT-inspired Data Model (Grow-Only Set):

We model the cart as a DynamoDB Map, where each key is a unique identifier for the item (like the SKU).

PK: USER#

SK: CART

items: Map, e.g., {'ABC': {'qty': 1, 'addedAt': '...'}, 'XYZ': {'qty': 2, 'addedAt': '...'}}

Now, adding an item is not a replacement of a list, but an update to a specific key within a map. This operation is commutative and associative, making it safe for concurrent execution across regions.

python

import boto3
from botocore.exceptions import ClientError
import uuid

# Assume table is a boto3.resource('dynamodb').Table('GlobalCarts')
def add_item_to_cart(user_id, sku, quantity):
    """
    Adds an item to a distributed shopping cart using a CRDT-like approach.
    This operation is idempotent and safe for concurrent execution.
    """
    key = {'PK': f'USER#{user_id}', 'SK': 'CART'}
    
    try:
        # We use an UpdateItem call which is an upsert operation.
        # This will add the item to the 'items' map if the cart exists,
        # or create the cart with the item if it doesn't.
        # The path `items.#sku` allows us to target a specific key in the map.
        table.update_item(
            Key=key,
            UpdateExpression="SET items.#sku = :item_details",
            ExpressionAttributeNames={
                '#sku': sku
            },
            ExpressionAttributeValues={
                ':item_details': {
                    'quantity': quantity,
                    'addedAt': str(uuid.uuid4()) # Use a unique ID to ensure this write is unique
                }
            }
        )
        print(f"Successfully added {quantity} of {sku} to cart for user {user_id}")
        return True
    except ClientError as e:
        # Handle potential errors like throttling
        print(f"Error adding item to cart: {e}")
        raise

# --- Usage Example ---
# Simulate two concurrent adds from different regions
# In a real scenario, these would be two separate Lambda functions/servers
# add_item_to_cart('user456', 'SKU-A1B2', 1) # Executed in us-east-1
# add_item_to_cart('user456', 'SKU-C3D4', 2) # Executed in eu-west-1

# After replication, the final state of the 'items' map will contain both SKU-A1B2 and SKU-C3D4,
# regardless of the order in which the updates were applied in the replica regions.

Edge Cases and Considerations:

* Removing Items: Removing items from this structure is more complex. A simple REMOVE items.#sku operation could re-introduce an item if a concurrent add operation's replication arrives late. This is a classic problem solved by CRDTs like the "Observed-Remove Set," which involves using tombstones (markers for deleted items) that are later garbage collected. This adds significant complexity to your application logic.

* Item Size Limits: The entire DynamoDB item (the cart) must be below the 400 KB limit. For unbounded sets, this pattern will fail. In such cases, you must model it differently, e.g., storing each cart item as a separate DynamoDB item.

* Incrementing Quantities: What if two requests try to increment the quantity of the same SKU? The code above would still be subject to LWW on the item_details map. To handle this, you'd need to use an atomic counter update: UpdateExpression="SET items.#sku.quantity = items.#sku.quantity + :inc". This ensures the increment is not lost.

Pattern 3: The Write-Ledger Pattern for Full Auditability

For systems where every state change is critical and no data can ever be lost or overwritten (e.g., financial ledgers, order processing systems), both LWW and optimistic locking can be insufficient. The Write-Ledger (or Event Sourcing) pattern provides the highest level of data integrity.

Instead of updating a single item that represents the current state, we treat the database as an immutable, append-only log of events. The current state is a projection of this log.

Data Model:

We use a table to store events.

PK: ACCOUNT# (The entity identifier)

SK: # (A sortable, unique key for each event)

eventType: String (e.g., 'DEPOSIT', 'WITHDRAWAL')

amount: Number

... other event metadata

Writes are always new PutItem calls, which are naturally conflict-free as long as the SK is unique. Reading the current balance requires querying all events for an account and summing them up.

Production Example: A Simple Banking Ledger

python

import boto3
from boto3.dynamodb.conditions import Key
import uuid
import datetime

# Assume ledger_table is a boto3.resource('dynamodb').Table('GlobalLedger')
def record_transaction(account_id, event_type, amount):
    """
    Records a new transaction in the immutable ledger.
    This operation is inherently conflict-free.
    """
    timestamp = datetime.datetime.utcnow().isoformat()
    event_id = str(uuid.uuid4())
    
    item = {
        'PK': f'ACCOUNT#{account_id}',
        'SK': f'{timestamp}#{event_id}',
        'eventType': event_type,
        'amount': amount
    }
    
    # This is an append-only operation
    ledger_table.put_item(Item=item)
    print(f"Recorded transaction {event_id} for account {account_id}")
    return item

def get_account_balance(account_id):
    """
    Calculates the current balance by replaying the event log.
    """
    response = ledger_table.query(
        KeyConditionExpression=Key('PK').eq(f'ACCOUNT#{account_id}')
    )
    
    balance = 0
    for item in response['Items']:
        if item['eventType'] == 'DEPOSIT':
            balance += item['amount']
        elif item['eventType'] == 'WITHDRAWAL':
            balance -= item['amount']
            
    return balance

# --- Usage Example ---
# account_id = 'ACC12345'
# record_transaction(account_id, 'DEPOSIT', 100) # from us-east-1
# record_transaction(account_id, 'WITHDRAWAL', 20) # from eu-west-1
# time.sleep(2) # allow for replication

# # Reading from any region will yield the correct balance
# balance_us = get_account_balance(account_id) # region us-east-1
# balance_eu = get_account_balance(account_id) # region eu-west-1
# print(f"Final balance is {balance_us}") # Should be 80

Performance Optimization with Materialized Views

The major drawback of the ledger pattern is read performance. Calculating the balance for an account with thousands of transactions on every request is inefficient. We can solve this by creating a materialized view in a separate DynamoDB table.

State Table: Create a second table (e.g., GlobalBalances) with a simple PK of ACCOUNT# and an attribute currentBalance.

Stream Processor: Enable DynamoDB Streams on the GlobalLedger table.

Lambda Function: Create an AWS Lambda function that is triggered by the GlobalLedger stream.

Update View: When a new transaction event appears in the stream, the Lambda function reads the event, calculates the new balance, and performs an atomic update on the corresponding item in the GlobalBalances table using an UpdateExpression (SET currentBalance = currentBalance + :amount).

This gives you the best of both worlds: a fully auditable, immutable log of transactions and a low-latency, pre-calculated view of the current state for fast reads.

Handling Idempotency: The Unsung Hero of Distributed Writes

In any distributed system, clients or intermediate services might retry requests due to network timeouts or transient errors. In an active-active system, a retry could be routed to a different region, leading to duplicate operations. For example, a user clicking "Submit Order" twice could result in two orders being created.

Your write APIs must be idempotent. A common pattern is to require clients to generate a unique idempotency key (e.g., a UUID) for each distinct operation.

Production Pattern: Idempotency Key Check

The server-side logic uses this key to de-duplicate requests. We can use a separate DynamoDB table to track processed idempotency keys.

Client generates a unique idempotency-key for the request.

Server receives the request.

Server attempts to PutItem into an IdempotencyKeys table. The item's key is the idempotency-key from the client. A ConditionExpression of attribute_not_exists(PK) is used.

If the PutItem succeeds: The key is new. Proceed with the business logic. Store the result of the operation in the idempotency item and set a TTL.

If the PutItem fails with ConditionalCheckFailedException: The key has been seen before. Fetch the item from the IdempotencyKeys table and return the stored result from the original operation.

python

import boto3
from botocore.exceptions import ClientError
import json
import time

# idempotency_table: PK=idempotencyKey, TTL attribute
# orders_table: The actual table for business logic

def create_order_idempotent(idempotency_key, order_details):
    """
    Creates an order using an idempotency key to prevent duplicates.
    """
    ttl = int(time.time()) + 3600 # Expire key after 1 hour
    
    try:
        # 1. Attempt to claim the idempotency key
        idempotency_table.put_item(
            Item={'idempotencyKey': idempotency_key, 'ttl': ttl},
            ConditionExpression='attribute_not_exists(idempotencyKey)'
        )
    except ClientError as e:
        if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
            # Key already exists, this is a retry.
            print(f"Idempotency key {idempotency_key} already processed.")
            # Fetch and return the original response if stored
            response_item = idempotency_table.get_item(Key={'idempotencyKey': idempotency_key})
            return json.loads(response_item.get('Item', {}).get('response', '{}'))
        else:
            raise

    try:
        # 2. Key claimed, proceed with business logic
        print(f"Processing new order with key {idempotency_key}")
        # ... logic to create the order in the orders_table ...
        order_id = 'ORD-' + str(uuid.uuid4())[:8]
        result = {'status': 'SUCCESS', 'orderId': order_id}
        
        # 3. Store the result against the idempotency key
        idempotency_table.update_item(
            Key={'idempotencyKey': idempotency_key},
            UpdateExpression="SET #resp = :resp",
            ExpressionAttributeNames={'#resp': 'response'},
            ExpressionAttributeValues={':resp': json.dumps(result)}
        )
        return result
    except Exception as e:
        # If business logic fails, delete the idempotency key to allow a clean retry
        idempotency_table.delete_item(Key={'idempotencyKey': idempotency_key})
        raise e

Final Architectural Considerations

* Cost Model: Global Tables are priced based on replicated write capacity units (rWCUs), which are typically more expensive than standard WCUs. Every write in one region incurs a write cost in all replica regions. Architect your application to minimize unnecessary writes.

* Monitoring Replication Lag: The ReplicationLatency CloudWatch metric is your most important operational health indicator. Set alarms on its P90 or P99 values. Sustained high latency can increase the window for write conflicts and lead to a poor user experience.

* Consistency vs. Complexity: Each pattern presented here adds application-level complexity. The choice is a trade-off. For simple use cases where occasional data loss on non-critical attributes is acceptable, LWW might suffice. For transactional systems, optimistic locking is a good baseline. For auditable systems, the ledger pattern is the most robust. There is no single correct answer; the right pattern is dictated by the business requirements of your specific feature.

Building a true active-active system with DynamoDB Global Tables is a powerful capability, but it forces engineers to confront the complexities of distributed systems head-on. By moving beyond the default LWW behavior and implementing robust, application-aware concurrency patterns, you can build globally resilient applications that meet the most demanding availability and performance requirements.

The Peril and Promise of Active-Active Architectures

Understanding the Replication Engine: Beyond the Surface

Pattern 1: Optimistic Concurrency Control with Conditional Writes

Production Example: Managing User Profile Updates

Edge Cases and Considerations:

Pattern 2: Application-Side Merging & CRDT-like Behavior

Production Example: A Multi-Region Shopping Cart

Edge Cases and Considerations:

Pattern 3: The Write-Ledger Pattern for Full Auditability

Production Example: A Simple Banking Ledger

Performance Optimization with Materialized Views

Handling Idempotency: The Unsung Hero of Distributed Writes

Production Pattern: Idempotency Key Check

Final Architectural Considerations

Found this article helpful?