Production-Ready Optimistic Locking in DynamoDB with Conditional Writes

October 16, 2025

24 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitability of Race Conditions in High-Throughput Systems

In any distributed system operating at scale, the classic read-modify-write cycle is a ticking time bomb for data integrity. Consider an inventory management system built on DynamoDB: two concurrent requests attempt to decrement the stock count for the last available item. Without a concurrency control mechanism, both requests might read the stock count as '1', both will calculate the new stock as '0' in their respective processes, and both will attempt to write '0' back to the database. The first write succeeds. The second write, unaware of the first, also succeeds, overwriting the same value. The result? One item was sold, but the system might have processed two orders, leading to a stockout and a failure to fulfill an order. This is a classic lost update anomaly.

Traditional relational databases solve this with pessimistic locking (e.g., SELECT ... FOR UPDATE), where a row is locked for the duration of a transaction, forcing other transactions to wait. This approach guarantees consistency but introduces contention, reduces throughput, and is fundamentally at odds with DynamoDB's architecture, which is optimized for massive parallelism and low-latency access patterns.

The DynamoDB-native solution is Optimistic Concurrency Control (OCC), also known as optimistic locking. Instead of preventing concurrent access, OCC allows it and provides a mechanism to verify that the data has not been modified by another process since it was read. If a conflict is detected, the transaction is aborted, and the client is responsible for handling the conflict, typically by retrying the operation.

This article is not an introduction to OCC. It is a detailed guide for implementing a robust, production-ready OCC pattern in DynamoDB using its core features: version attributes, Conditional Writes, and the explicit handling of ConditionalCheckFailedException. We will dissect the implementation, build resilient retry mechanisms, and explore its application in complex transactional scenarios.

Anatomy of the Lost Update Anomaly

Let's formalize the inventory problem with a concrete DynamoDB item structure and a naive implementation that demonstrates the flaw.

Item Structure:

* productId (Partition Key, String)

* stockCount (Number)

* lastUpdatedAt (String, ISO 8601)

Our goal is to implement a function, decrease_inventory, that safely decrements stockCount.

A naive, and incorrect, implementation using Python's boto3 might look like this:

python

# WARNING: This code contains a race condition and is NOT for production use.
import boto3

def naive_decrease_inventory(table, product_id: str, quantity_to_decrease: int):
    """This function is vulnerable to lost updates."""
    # 1. Read Phase
    response = table.get_item(Key={'productId': product_id})
    item = response.get('Item')

    if not item:
        raise ValueError(f"Product {product_id} not found.")

    current_stock = int(item.get('stockCount', 0))

    if current_stock < quantity_to_decrease:
        raise ValueError("Insufficient stock.")

    # 2. Modify Phase (in-memory)
    new_stock = current_stock - quantity_to_decrease

    # 3. Write Phase
    table.put_item(
        Item={
            'productId': product_id,
            'stockCount': new_stock,
            'lastUpdatedAt': '...' # current timestamp
        }
    )
    print(f"Successfully updated stock for {product_id} to {new_stock}")
    return new_stock

Visualizing the Race Condition:

Let's assume stockCount is 10.

Timeline	Process A (Request for 3 items)	Process B (Request for 5 items)	Database State (`stockCount`)
T1	`get_item` -> reads `stockCount: 10`		10
T2		`get_item` -> reads `stockCount: 10`	10
T3	Calculates `new_stock = 10 - 3 = 7`		10
T4		Calculates `new_stock = 10 - 5 = 5`	10
T5	`put_item` with `stockCount: 7`		7
T6		`put_item` with `stockCount: 5`	5

Outcome: The final stockCount is 5. We have sold 8 items (3 + 5), but the database reflects a reduction of only 5. Process A's update was completely lost. This is the exact scenario OCC is designed to prevent.

The Versioned Optimistic Locking Pattern

The solution is to introduce a version attribute. A common convention is to name it _version or version. This is simply a number that is incremented with every successful write.

New Item Structure:

* productId (PK, String)

* stockCount (Number)

* _version (Number)

* lastUpdatedAt (String)

The modified, correct workflow is as follows:

Read: Fetch the item, including its current _version.

Modify: Perform business logic and calculate the new state of the item in memory.

Increment Version: Increment the _version number in memory.

Conditional Write: Use a put_item (or update_item) operation with a ConditionExpression. This expression asserts that the _version of the item in the database must match the _version that was read in step 1.

If the condition is met, the write succeeds. If another process modified the item between our read and write (the race), its own write would have incremented the _version. Our condition will then fail, and DynamoDB will reject our write, throwing a ConditionalCheckFailedException.

Production-Grade Implementation

Here is the corrected implementation incorporating the versioning pattern.

python

import boto3
from botocore.exceptions import ClientError
import time
import random

dynamodb = boto3.resource('dynamodb')

class InsufficientStockError(Exception):
    pass

class ConcurrencyError(Exception):
    pass

def decrease_inventory_with_occ(table_name: str, product_id: str, quantity_to_decrease: int, max_retries: int = 5):
    """Safely decrements inventory using optimistic locking with versioning."""
    table = dynamodb.Table(table_name)
    
    retries = 0
    while retries < max_retries:
        try:
            # 1. Read Phase
            response = table.get_item(Key={'productId': product_id})
            item = response.get('Item')

            if not item:
                raise ValueError(f"Product {product_id} not found.")

            current_stock = int(item.get('stockCount', 0))
            expected_version = int(item.get('_version', 0))

            if current_stock < quantity_to_decrease:
                raise InsufficientStockError("Insufficient stock.")

            # 2. Modify Phase (in-memory)
            new_stock = current_stock - quantity_to_decrease
            new_version = expected_version + 1

            # 3. Conditional Write Phase
            print(f"Attempt {retries + 1}: Trying to update {product_id} from version {expected_version} to {new_version}")
            table.put_item(
                Item={
                    'productId': product_id,
                    'stockCount': new_stock,
                    '_version': new_version,
                    'lastUpdatedAt': '...' # current timestamp
                },
                ConditionExpression='#v = :ev',
                ExpressionAttributeNames={'#v': '_version'},
                ExpressionAttributeValues={':ev': expected_version}
            )
            
            print(f"Successfully updated stock for {product_id} to {new_stock}. Final version: {new_version}")
            return new_stock

        except ClientError as e:
            if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
                print(f"Concurrency conflict for {product_id}. Version {expected_version} is stale. Retrying...")
                retries += 1
                # Exponential backoff with jitter
                sleep_time = (2 ** retries) * 0.1 + random.uniform(0, 0.1)
                time.sleep(sleep_time)
            else:
                # Re-raise other DynamoDB errors
                raise
        except (InsufficientStockError, ValueError) as e:
            # Business logic errors that should not be retried
            raise e

    raise ConcurrencyError(f"Failed to update {product_id} after {max_retries} retries due to high contention.")

Dissecting the ConditionExpression:

* ConditionExpression='#v = :ev': This is the core of the lock. It instructs DynamoDB to only proceed with the put_item operation IF the condition is true.

* ExpressionAttributeNames={'#v': '_version'}: This is a best practice to avoid conflicts with DynamoDB reserved words. We map the placeholder #v to the actual attribute name _version.

* ExpressionAttributeValues={':ev': expected_version}: This maps the placeholder :ev to the actual value of the version we read from the database. It's crucial that this value is a number, matching the type of the _version attribute.

`ConditionalCheckFailedException`: A Feature, Not a Bug

The most critical mindset shift for engineers new to this pattern is understanding that ConditionalCheckFailedException is not an error to be logged and alerted on. It is an expected, normal part of the control flow. It is the signal from DynamoDB that a race condition occurred and your process lost. Your application logic must handle this exception gracefully.

Our implementation above demonstrates the standard response: a client-side retry loop.

Designing a Resilient Retry Strategy

A naive while True retry loop is dangerous. In a high-contention scenario, it can lead to a thundering herd problem, where multiple clients are retrying aggressively, increasing database load and the probability of further collisions. A production-grade retry mechanism must include:

Max Retries: A hard limit on the number of attempts to prevent infinite loops and to fail fast when contention is unmanageably high. In our example, max_retries = 5.

Exponential Backoff: The delay between retries should increase exponentially (e.g., 100ms, 200ms, 400ms, ...). This gives the competing processes time to complete their work, reducing the likelihood of another immediate collision.

Jitter: Adding a small, random amount of time to the backoff delay prevents clients from retrying in synchronized waves. If two clients fail at the same time and both have the same exponential backoff delay, they are likely to collide again on their next attempt. Jitter breaks this synchronization. Our code uses (2 retries) * 0.1 + random.uniform(0, 0.1) for this purpose.

This retry logic should be encapsulated within your data access layer, making it transparent to the higher-level business logic.

Advanced Scenarios and Edge Cases

While the versioned put_item covers many use cases, real-world systems often have more complex requirements.

Edge Case 1: Atomic Counters with `UpdateItem`

For simple atomic increments or decrements, the read-modify-write cycle can be optimized away. DynamoDB's UpdateItem operation can perform atomic operations directly on the server.

python

def atomic_decrease_inventory(table, product_id: str, quantity: int):
    """Atomically decreases stock, but only if sufficient stock exists."""
    try:
        response = table.update_item(
            Key={'productId': product_id},
            UpdateExpression='SET stockCount = stockCount - :q',
            # Condition ensures we don't go below zero
            ConditionExpression='stockCount >= :q',
            ExpressionAttributeValues={':q': quantity},
            ReturnValues='UPDATED_NEW'
        )
        return response['Attributes']['stockCount']
    except ClientError as e:
        if e.response['Error']['Code'] == 'ConditionalCheckFailedException':
            # This means stockCount was less than quantity
            raise InsufficientStockError("Insufficient stock for atomic update.")
        else:
            raise

This is highly efficient. However, it doesn't solve the full OCC problem. What if you need to update the stockCount AND another attribute (e.g., status) based on the original state of the item? In that case, you still need the versioning pattern. You can combine them:

python

# ... inside the retry loop ...

table.update_item(
    Key={'productId': product_id},
    UpdateExpression='SET stockCount = :ns, #v = :nv, #s = :new_status',
    ConditionExpression='#v = :ev',
    ExpressionAttributeNames={
        '#v': '_version',
        '#s': 'status'
    },
    ExpressionAttributeValues={
        ':ns': new_stock,
        ':nv': new_version,
        ':new_status': 'LOW_STOCK', # Example of another change
        ':ev': expected_version
    }
)

Here, the update_item call performs the entire state transition atomically, but only if the version matches. This is often more efficient than put_item as it only sends the changed attributes over the wire.

Edge Case 2: Transactions with `TransactWriteItems`

Optimistic locking truly shines when coordinating changes across multiple items. Imagine a user placing an order. This requires two operations that must succeed or fail together:

Decrement the inventory for the product.
Create a new order item for the user.

DynamoDB's TransactWriteItems allows you to group up to 100 write operations into a single, all-or-nothing transaction. Each operation within the transaction can have its own ConditionExpression.

Item Structures:

* Products Table: productId (PK), stockCount, _version

* Orders Table: orderId (PK), userId, productId, status

python

import uuid

def place_order_transaction(product_id: str, user_id: str, quantity: int):
    transact_client = boto3.client('dynamodb')
    products_table = dynamodb.Table('Products')
    
    # In a real app, this would be inside a retry loop
    # For brevity, showing a single attempt.
    
    # 1. READ PHASE (outside the transaction)
    product_response = products_table.get_item(Key={'productId': product_id})
    product = product_response.get('Item')
    if not product or product['stockCount'] < quantity:
        raise InsufficientStockError("Insufficient stock.")
    
    expected_version = int(product.get('_version', 0))
    new_stock = int(product['stockCount']) - quantity
    new_version = expected_version + 1
    new_order_id = str(uuid.uuid4())

    # 2. TRANSACTIONAL WRITE PHASE
    try:
        transact_client.transact_write_items(
            TransactItems=[
                {
                    'Update': {
                        'TableName': 'Products',
                        'Key': {'productId': {'S': product_id}},
                        'UpdateExpression': 'SET stockCount = :ns, #v = :nv',
                        'ConditionExpression': '#v = :ev',
                        'ExpressionAttributeNames': {'#v': '_version'},
                        'ExpressionAttributeValues': {
                            ':ns': {'N': str(new_stock)},
                            ':nv': {'N': str(new_version)},
                            ':ev': {'N': str(expected_version)}
                        }
                    }
                },
                {
                    'Put': {
                        'TableName': 'Orders',
                        'Item': {
                            'orderId': {'S': new_order_id},
                            'userId': {'S': user_id},
                            'productId': {'S': product_id},
                            'status': {'S': 'PENDING'}
                        },
                        # Ensure this order doesn't already exist (idempotency)
                        'ConditionExpression': 'attribute_not_exists(orderId)'
                    }
                }
            ]
        )
        print(f"Order {new_order_id} placed successfully.")
        return new_order_id
    except ClientError as e:
        if e.response['Error']['Code'] == 'TransactionCanceledException':
            # The entire transaction was rolled back. 
            # Check the CancellationReasons to see which condition failed.
            reasons = e.response['CancellationReasons']
            print(f"Transaction failed: {reasons}")
            # One of the reasons will be 'ConditionalCheckFailed'.
            # This is the signal to retry the entire read-and-transact operation.
            raise ConcurrencyError("Order transaction failed due to contention.")
        else:
            raise

In this advanced pattern, if another process updates the product's stock (and thus its version) between our read and our transaction, the Update operation's condition will fail. This causes the entire transaction to be rolled back atomically. The Put operation for the new order will never be committed. This guarantees consistency across tables without complex distributed locks.

Performance and Cost Considerations

* Failed Writes Consume Capacity: A crucial point to remember is that a failed conditional write still consumes one Write Capacity Unit (WCU). If your system has extremely high contention, you could be paying for a large number of failed writes. This is a signal that you may need to reconsider your data model or access patterns.

* Monitoring Contention: Monitor the ConditionalCheckFailedRequests metric in CloudWatch for your DynamoDB tables. A persistently high number indicates significant contention. This is your primary tool for understanding the level of concurrency conflicts in your system.

* The Cost of Reads: The OCC pattern requires at least one GetItem call before every write attempt. In a low-contention environment, this is a small overhead. In a high-contention scenario requiring multiple retries, you will perform multiple reads for a single logical update, increasing your Read Capacity Unit (RCU) consumption.

Testing Your Concurrency Logic

Never assume your OCC logic is correct without testing it under concurrent load. You can simulate this locally using Python's multiprocessing or threading modules.

Here's a conceptual test setup:

python

from multiprocessing import Pool, cpu_count

# Assume you have a function setup_test_table() that creates a table
# with a product 'PROD123' with stockCount = 100 and _version = 1

TABLE_NAME = 'InventoryTest'

def worker_task(task_id):
    """A single worker trying to decrement inventory."""
    try:
        print(f"Worker {task_id} starting...")
        final_stock = decrease_inventory_with_occ(
            table_name=TABLE_NAME,
            product_id='PROD123',
            quantity_to_decrease=1
        )
        print(f"Worker {task_id} finished successfully. Stock is now {final_stock}")
        return 'SUCCESS'
    except Exception as e:
        print(f"Worker {task_id} failed: {e}")
        return 'FAILURE'

if __name__ == '__main__':
    # Setup initial state in DynamoDB
    # ... setup_test_table(initial_stock=100)
    
    num_workers = 20 # More workers than available stock to force contention
    
    with Pool(processes=cpu_count()) as pool:
        results = pool.map(worker_task, range(num_workers))

    success_count = results.count('SUCCESS')
    print(f"\nTotal successful decrements: {success_count}")

    # Verify final state in DynamoDB
    # The final stockCount should be 100 - success_count
    # The final _version should be 1 + success_count
    # ... verification logic ...

Running this test will produce logs showing the retry attempts and ConditionalCheckFailedException being handled, proving that your locking mechanism correctly serializes access and prevents lost updates.

Conclusion: Embrace Controlled Failure

Optimistic Concurrency Control in DynamoDB is more than a feature; it's a design philosophy. It requires shifting from a mindset of preventing concurrent access to one of embracing and managing the resulting conflicts. By instrumenting your data with a version attribute and using ConditionExpression, you gain a powerful, scalable mechanism to ensure data integrity without sacrificing the performance and parallelism that make DynamoDB a compelling choice for high-throughput applications.

The key takeaways for senior engineers are:

ConditionalCheckFailedException is the success path for handling contention. It must be caught and handled with a robust retry strategy.

A resilient retry mechanism is non-negotiable. It must incorporate exponential backoff, jitter, and a maximum retry limit to ensure system stability.

OCC extends seamlessly to multi-item transactions via TransactWriteItems, allowing for complex, atomic operations across your data model without distributed locks.

Monitor ConditionalCheckFailedRequests. This metric is your window into the level of contention in your application and is a critical input for performance tuning.

By mastering this pattern, you are equipping your applications to handle the concurrency inherent in distributed systems, ensuring that your data remains correct and consistent, even under extreme load.

The Inevitability of Race Conditions in High-Throughput Systems

Anatomy of the Lost Update Anomaly

The Versioned Optimistic Locking Pattern

Production-Grade Implementation

`ConditionalCheckFailedException`: A Feature, Not a Bug

Designing a Resilient Retry Strategy

Advanced Scenarios and Edge Cases

Edge Case 1: Atomic Counters with `UpdateItem`

Edge Case 2: Transactions with `TransactWriteItems`

Performance and Cost Considerations

Testing Your Concurrency Logic

Conclusion: Embrace Controlled Failure

Found this article helpful?