Mastering DynamoDB Concurrency with Conditional Writes & Versioning
The Inevitability of Race Conditions in Distributed Systems
In any distributed system operating at scale, the classic read-modify-write pattern is a ticking time bomb for data integrity. When multiple concurrent processes read the same piece of data, modify it in memory, and attempt to write it back, a "lost update" anomaly is almost guaranteed. The last writer wins, silently overwriting the changes made by others.
Consider a high-traffic inventory management system for an e-commerce platform during a flash sale. The DynamoDB table for products might look like this:
{
"sku": "TSHIRT-BLK-L",
"description": "Large Black T-Shirt",
"stock_count": 100,
"price": 19.99
}
A naive implementation for processing an order might be:
TSHIRT-BLK-L. stock_count is 100.new_stock = 100 - 1 = 99.stock_count to 99.Now, imagine two application servers, Server A and Server B, receive an order for the same SKU at virtually the same millisecond:
stock_count is 100.stock_count is 100.stock_count is now 99.stock_count is now... 99.Two shirts were sold, but the inventory was only decremented by one. This is a catastrophic failure of data consistency that leads to overselling, customer dissatisfaction, and financial loss. Traditional relational databases solve this with row-level locking or complex transaction isolation levels, but these mechanisms often come at a significant performance cost and are antithetical to the design philosophy of a hyperscale NoSQL database like DynamoDB.
The Optimistic Concurrency Control (OCC) Pattern
Instead of preventing concurrent access through pessimistic locking (i.e., assuming a conflict will happen and locking the data), Optimistic Concurrency Control (OCC) operates on a different principle: assume conflicts are rare, but verify that no conflict occurred before committing the write.
This is a perfect fit for DynamoDB's architecture. It avoids the overhead and complexity of managing distributed locks, allowing for maximum throughput while still guaranteeing data correctness. The implementation relies on two key components:
version, revision, etag). This number is atomically incremented with every successful write.version number of the item on the server is still the same as the one I originally read."If another process has modified the item in the interim, its version number will have changed, your conditional write will fail, and DynamoDB will inform you of the conflict. This failure is not an error to be logged and ignored; it is a critical signal to your application that it needs to handle the contention.
Our item schema is now updated:
{
"sku": "TSHIRT-BLK-L",
"description": "Large Black T-Shirt",
"stock_count": 100,
"price": 19.99,
"version": 1
}
Deep Dive into DynamoDB Conditional Writes
DynamoDB's PutItem, UpdateItem, and DeleteItem API calls all accept a ConditionExpression parameter. This is a string that contains a boolean expression that must evaluate to true for the operation to succeed. If it evaluates to false, DynamoDB rejects the write and returns a ConditionalCheckFailedException.
To implement OCC, our ConditionExpression will check the version attribute.
Let's walk through the corrected flow from our previous example:
stock_count: 100 and version: 1.stock_count: 100 and version: 1.stock_count to 99 and increment version to 2. Its UpdateItem call includes a crucial condition: ConditionExpression: "version = :expectedVersion", where :expectedVersion is 1.{ stock_count: 99, version: 2 }.stock_count to 99 and version to 2. Its UpdateItem call also includes the condition: ConditionExpression: "version = :expectedVersion", where :expectedVersion is 1.version, which is 2. Since 2 != 1, the condition fails. DynamoDB rejects the update and returns a ConditionalCheckFailedException to Server B.Server B is now aware of the conflict. The lost update has been prevented. The responsibility now shifts to Server B's application logic to handle this exception gracefully.
Production-Grade Implementation Pattern
Let's implement this pattern in TypeScript using the AWS SDK for JavaScript v3. We'll create a function to decrement stock for a given SKU.
import {
DynamoDBClient,
GetItemCommand,
UpdateItemCommand,
ConditionalCheckFailedException
} from "@aws-sdk/client-dynamodb";
import { marshall, unmarshall } from "@aws-sdk/util-dynamodb";
const dynamoDBClient = new DynamoDBClient({});
const TABLE_NAME = process.env.TABLE_NAME || 'Inventory';
interface Product {
sku: string;
stock_count: number;
version: number;
// ... other attributes
}
async function decrementStock(sku: string, quantity: number): Promise<void> {
// 1. READ the current state of the item
const getItemResponse = await dynamoDBClient.send(new GetItemCommand({
TableName: TABLE_NAME,
Key: marshall({ sku }),
// Use consistent reads for time-sensitive operations
ConsistentRead: true,
}));
if (!getItemResponse.Item) {
throw new Error(`Product with SKU ${sku} not found.`);
}
const currentProduct = unmarshall(getItemResponse.Item) as Product;
// 2. MODIFY the state in memory
if (currentProduct.stock_count < quantity) {
throw new Error(`Insufficient stock for SKU ${sku}.`);
}
const newStockCount = currentProduct.stock_count - quantity;
const expectedVersion = currentProduct.version;
const newVersion = expectedVersion + 1;
// 3. Perform the CONDITIONAL WRITE
try {
const updateParams = {
TableName: TABLE_NAME,
Key: marshall({ sku }),
UpdateExpression: "SET stock_count = :newStock, version = :newVersion",
ConditionExpression: "version = :expectedVersion",
ExpressionAttributeValues: marshall({
":newStock": newStockCount,
":newVersion": newVersion,
":expectedVersion": expectedVersion
}),
ReturnValues: "NONE",
};
await dynamoDBClient.send(new UpdateItemCommand(updateParams));
console.log(`Successfully decremented stock for ${sku}. New stock: ${newStockCount}`);
} catch (error) {
if (error instanceof ConditionalCheckFailedException) {
// The race condition was detected!
console.warn(`Conflict detected for SKU ${sku}. Another process updated the item.`);
// In a real application, you would now trigger a retry.
// For this example, we'll just re-throw to indicate failure.
throw new Error(`Conflict detected while updating SKU ${sku}. Please retry.`);
} else {
// Handle other potential errors (e.g., throttling, network issues)
console.error("An unexpected error occurred:", error);
throw error;
}
}
}
This code correctly implements the core Read-Modify-ConditionalWrite cycle. A key detail is ConsistentRead: true. For a sensitive workflow like this, you want to ensure your read operation retrieves the absolute latest value from the leader partition, even if it incurs slightly higher latency and cost, to minimize the chance of starting your logic with stale data.
Advanced Error Handling & Retry Logic
Simply catching the ConditionalCheckFailedException is not enough. A production system must react to it intelligently. The correct response is to retry the entire operation, starting from the read.
A naive retry loop is dangerous. If contention is high, multiple clients could enter a tight retry loop, hammering the database and creating a thundering herd problem. The standard solution is to implement a retry mechanism with exponential backoff and jitter.
Let's create a higher-order function to encapsulate this complex retry logic, making our business logic cleaner.
// (imports from previous example)
// Utility for async sleep
const sleep = (ms: number) => new Promise(res => setTimeout(res, ms));
// Configuration for retry logic
const RETRY_CONFIG = {
maxRetries: 5,
baseDelayMs: 50,
maxDelayMs: 1000,
};
/**
* A higher-order function that wraps a read-modify-write operation
* with robust OCC retry logic.
*/
async function withOccRetry<T>(
operation: () => Promise<T>
): Promise<T> {
let attempts = 0;
while (true) {
try {
return await operation();
} catch (error) {
if (error instanceof ConditionalCheckFailedException) {
attempts++;
if (attempts > RETRY_CONFIG.maxRetries) {
console.error(`Operation failed after ${RETRY_CONFIG.maxRetries} attempts due to persistent conflicts.`);
throw new Error('Failed to update item due to high contention.');
}
// Exponential backoff with jitter
const backoff = Math.min(
RETRY_CONFIG.maxDelayMs,
RETRY_CONFIG.baseDelayMs * Math.pow(2, attempts)
);
const jitter = backoff * Math.random();
const delay = backoff / 2 + jitter;
console.warn(`Conflict detected. Retrying attempt ${attempts} in ${Math.round(delay)}ms...`);
await sleep(delay);
// The loop will continue, re-invoking the operation
} else {
// Re-throw non-conflict errors immediately
throw error;
}
}
}
}
// Refactored business logic using the retry wrapper
async function robustDecrementStock(sku: string, quantity: number): Promise<void> {
const operation = async () => {
// The entire read-modify-write cycle is now inside the operation lambda
const getItemResponse = await dynamoDBClient.send(new GetItemCommand({
TableName: TABLE_NAME,
Key: marshall({ sku }),
ConsistentRead: true,
}));
if (!getItemResponse.Item) throw new Error(`Product with SKU ${sku} not found.`);
const currentProduct = unmarshall(getItemResponse.Item) as Product;
if (currentProduct.stock_count < quantity) {
throw new Error(`Insufficient stock for SKU ${sku}.`);
}
const newStockCount = currentProduct.stock_count - quantity;
const expectedVersion = currentProduct.version;
const updateParams = {
TableName: TABLE_NAME,
Key: marshall({ sku }),
// We can combine the version increment and check in one expression
UpdateExpression: "SET stock_count = :newStock, version = version + :inc",
ConditionExpression: "version = :expectedVersion",
ExpressionAttributeValues: marshall({
":newStock": newStockCount,
":inc": 1,
":expectedVersion": expectedVersion
}),
};
await dynamoDBClient.send(new UpdateItemCommand(updateParams));
};
await withOccRetry(operation);
console.log(`Successfully decremented stock for ${sku}.`);
}
// Example usage:
// await robustDecrementStock('TSHIRT-BLK-L', 1);
This refactored version is far more robust. The business logic (robustDecrementStock) is clean and focuses on the what, while the withOccRetry wrapper handles the complex how of contention management. Notice the optimized UpdateExpression: SET ..., version = version + :inc. This is an atomic counter pattern, which is slightly more efficient than incrementing the version in the application code.
Edge Cases and Performance Considerations
Senior engineers must consider the boundaries where a pattern excels and where it breaks down.
High Contention Scenarios
OCC is based on the premise that conflicts are the exception, not the rule. If an item is extremely "hot" and dozens of processes are trying to update it every second, the ConditionalCheckFailedException rate will skyrocket. This has two negative effects:
maxRetries is exceeded, the operation fails entirely.Solution: If you identify a hot item, it's a sign your data model may need refinement. Consider breaking the data apart. For example, instead of a single stock_count, you could shard the counter across multiple items (e.g., inventory_shard_1, inventory_shard_2) and have workers pull from different shards. This distributes the write load, but adds significant complexity to reads (which now must aggregate across shards).
First Write (Item Creation)
How do you apply this pattern when an item doesn't exist yet? You need to ensure your write is a true creation, not an accidental overwrite of an item with the same key that was created by another process moments before. The ConditionExpression for this is attribute_not_exists(partitionKey).
async function createProduct(product: Omit<Product, 'version'>): Promise<void> {
const initialVersion = 1;
const itemToCreate = { ...product, version: initialVersion };
const putParams = {
TableName: TABLE_NAME,
Item: marshall(itemToCreate),
// This ensures we only succeed if the item does not already exist
ConditionExpression: "attribute_not_exists(sku)",
};
try {
await dynamoDBClient.send(new PutItemCommand(putParams));
} catch (error) {
if (error instanceof ConditionalCheckFailedException) {
throw new Error(`Product with SKU ${product.sku} already exists.`);
}
throw error;
}
}
Comparison with DynamoDB Transactions
DynamoDB offers TransactWriteItems, which allows for atomic updates across multiple items in one or more tables. You can include condition checks within a transaction. When should you use OCC versus a transaction?
For our inventory example, since we only ever modify one product item at a time, OCC is the superior choice.
A Complex Real-World Scenario: A Bidding System
Let's model a real-time auction system where high-frequency bidding is expected.
Item Structure:
{
"itemId": "ART-VANGOGH-1889",
"auctionStatus": "OPEN",
"highestBid": 150000.00,
"highestBidder": "user-123",
"bidCount": 42,
"version": 17
}
The placeBid function must atomically verify the new bid is higher than the current highestBid and then update the item. This is a perfect use case for a conditional update incorporating both the version check and the business logic.
interface Bid {
itemId: string;
bidAmount: number;
bidderId: string;
}
async function placeBid(bid: Bid): Promise<void> {
const operation = async () => {
// 1. READ
const getItemResponse = await dynamoDBClient.send(new GetItemCommand({
TableName: 'Auctions',
Key: marshall({ itemId: bid.itemId }),
ConsistentRead: true,
}));
if (!getItemResponse.Item) throw new Error('Auction not found.');
const auction = unmarshall(getItemResponse.Item);
if (auction.auctionStatus !== 'OPEN') throw new Error('Auction is closed.');
// The business logic check is now part of the condition expression,
// but we can still pre-check to fail fast and avoid a wasted WCU.
if (bid.bidAmount <= auction.highestBid) {
throw new Error('Bid must be higher than the current highest bid.');
}
const expectedVersion = auction.version;
// 2. CONDITIONAL WRITE
// The condition now checks BOTH version and business logic.
const updateParams = {
TableName: 'Auctions',
Key: marshall({ itemId: bid.itemId }),
UpdateExpression: "SET highestBid = :bidAmount, highestBidder = :bidderId, bidCount = bidCount + :inc, version = version + :inc",
ConditionExpression: "version = :expectedVersion AND (highestBid < :bidAmount OR attribute_not_exists(highestBid))",
ExpressionAttributeValues: marshall({
":bidAmount": bid.bidAmount,
":bidderId": bid.bidderId,
":inc": 1,
":expectedVersion": expectedVersion
}),
};
await dynamoDBClient.send(new UpdateItemCommand(updateParams));
};
await withOccRetry(operation);
console.log(`Successfully placed bid of ${bid.bidAmount} for item ${bid.itemId} by ${bid.bidderId}`);
}
In this advanced example, the ConditionExpression is doing double duty. It not only checks version = :expectedVersion to prevent the lost update problem, but it also enforces the core business rule highestBid < :bidAmount. This is a powerful optimization. It pushes the final business logic check to the database itself, ensuring it is performed atomically as part of the write operation. Even if the data changed between our read and write, DynamoDB will re-evaluate the full condition and reject the update if the new bid is no longer the highest, preventing data corruption with maximum efficiency.
Conclusion
Optimistic Concurrency Control with conditional writes and a versioning attribute is not an optional feature; it is a foundational pattern for building correct, scalable, and resilient applications on DynamoDB. By understanding the read-modify-conditional-write cycle, implementing robust retry logic with exponential backoff and jitter, and pushing business logic into condition expressions where possible, you can confidently handle concurrent operations without resorting to costly and complex locking mechanisms. Mastering this pattern is a significant step in moving from casual DynamoDB usage to designing truly professional, production-grade distributed systems.