Idempotency Patterns in AWS Lambda with DynamoDB Conditional Writes

24 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitability of Duplicates in Modern Event-Driven Systems

In a distributed, event-driven architecture, the promise of "exactly-once" message delivery is often a siren's call. While services like Amazon SQS FIFO queues offer exactly-once processing guarantees within the queue itself, the broader ecosystem of client retries, network partitions, and downstream service failures means that your application logic will inevitably encounter the same event more than once. Most standard messaging systems, like SQS Standard Queues or EventBridge, offer an "at-least-once" delivery guarantee. This is a pragmatic trade-off, prioritizing durability over uniqueness of delivery.

For a senior engineer, this isn't news; it's a fundamental constraint of the systems we build. The critical question is not if we will receive a duplicate event, but how our application will handle it. Executing non-idempotent operations multiple times can have catastrophic business consequences:

  • Financial: Double-charging a customer for a purchase.
  • Data Integrity: Creating duplicate user accounts or records.
  • User Experience: Sending multiple identical notifications to a user.
  • To build robust, reliable systems, we must shift the burden of ensuring exactly-once processing from the infrastructure to our application layer. This is achieved through idempotency: designing an operation so that calling it multiple times with the same input produces the same result as calling it once. This post provides a deep, technical breakdown of a battle-tested pattern for achieving idempotency in AWS Lambda functions using DynamoDB's powerful features.

    The Idempotency Key Pattern with a Persistent Store

    The core strategy revolves around tracking the lifecycle of an operation using a unique idempotency key. This key, derived from the incoming event payload, serves as a unique identifier for a specific business transaction.

    The logic follows a state machine managed in a persistent, low-latency data store:

  • Receive Event & Extract Key: Upon invocation, the function extracts or generates a unique idempotency key from the event data (e.g., a transactionId, paymentId, or a hash of the payload).
  • Check & Record (Atomic Operation): The function attempts to create a record in a persistent store using the idempotency key. This record is marked with an initial status, such as IN_PROGRESS.
  • Handle Duplicates:
  • * If the record is created successfully (i.e., the key didn't exist), the function proceeds to execute the core business logic.

    * If the record creation fails because the key already exists, the operation is a potential duplicate. The function then inspects the existing record's state.

    * If the status is COMPLETED, the function can safely skip the business logic and return the stored result from the previous successful execution.

    * If the status is IN_PROGRESS, it indicates a concurrent execution or a previous failed attempt. The function must decide on a strategy: fail fast, wait, or retry.

  • Execute Business Logic: The core, non-idempotent operation is performed.
  • Update State:
  • * Upon successful execution, the record in the store is updated to COMPLETED, and the result of the operation is saved.

    * If the execution fails, the record might be updated to FAILED or deleted to allow for a clean retry.

    Why DynamoDB is the Ideal Choice

    While you could use other stores like Redis or a relational database, DynamoDB is exceptionally well-suited for this pattern in a serverless context:

  • Low Latency: Provides single-digit millisecond latency for key-value lookups, minimizing overhead on your Lambda's execution time.
  • Conditional Writes: DynamoDB's ConditionExpression parameter allows for atomic "check-and-set" operations. This is the cornerstone of the pattern, preventing race conditions without complex locking mechanisms.
  • Time To Live (TTL): Automatically expires and deletes old idempotency records, preventing the table from growing indefinitely and managing costs. This is also crucial for recovering from failed or timed-out executions.
  • Serverless & Scalable: Scales seamlessly with your Lambda's concurrency and fits perfectly into the pay-per-use model.
  • Deep Dive: A Production-Grade Implementation

    Let's build a robust idempotency layer from the ground up using TypeScript and the AWS SDK v3. We'll start with the foundational components and then refactor them into a reusable utility.

    1. DynamoDB Table Design

    First, define the DynamoDB table structure. Simplicity is key.

  • Table Name: IdempotencyStore
  • Partition Key: id (String) - This will store our idempotency key.
  • TTL Attribute: expiry (Number) - A Unix timestamp.
  • Our item structure within the table will be:

    json
    {
      "id": "txn-12345-abcde",
      "status": "IN_PROGRESS" | "COMPLETED" | "FAILED",
      "expiry": 1678886400,
      "responseData": "{\"status\":\"success\",\"orderId\":\"order-9876\"}"
    }
  • id: The idempotency key.
  • status: Tracks the lifecycle of the operation.
  • expiry: The Unix timestamp after which DynamoDB's TTL process can delete the item.
  • responseData: The serialized result of the business logic, to be returned on subsequent duplicate calls.
  • 2. The Core Logic: A Step-by-Step Lambda Handler

    Here is a complete, annotated Lambda handler demonstrating the core pattern. This example processes a payment from an SQS event.

    typescript
    // lib/dependencies.ts
    import { DynamoDBClient } from "@aws-sdk/client-dynamodb";
    import { DynamoDBDocumentClient, PutCommand, GetCommand, UpdateCommand } from "@aws-sdk/lib-dynamodb";
    
    const client = new DynamoDBClient({});
    export const ddbDocClient = DynamoDBDocumentClient.from(client);
    
    // A mock business logic function
    export const processPayment = async (amount: number, currency: string): Promise<{ transactionId: string; status: string }> => {
        console.log(`Processing payment of ${amount} ${currency}...`);
        // Simulate network delay and processing
        await new Promise(resolve => setTimeout(resolve, 1500));
        if (amount < 0) {
            throw new Error("Payment amount cannot be negative.");
        }
        console.log("Payment processed successfully.");
        return { transactionId: `proc-${Date.now()}`, status: 'SUCCESS' };
    };
    
    // handler.ts
    import { SQSEvent } from 'aws-lambda';
    import { ddbDocClient, processPayment } from './lib/dependencies';
    import { GetCommand, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb';
    import { ConditionalCheckFailedException } from '@aws-sdk/client-dynamodb';
    
    const IDEMPOTENCY_TABLE = process.env.IDEMPOTENCY_TABLE!;
    const IDEMPOTENCY_TTL_MINUTES = 15; // Should be > Lambda timeout + SQS visibility timeout
    
    // Define possible statuses for clarity
    type IdempotencyStatus = 'IN_PROGRESS' | 'COMPLETED';
    
    interface IdempotencyRecord {
        id: string;
        status: IdempotencyStatus;
        expiry: number;
        responseData?: any;
    }
    
    export const handler = async (event: SQSEvent) => {
        // Assuming a single record for simplicity
        const message = event.Records[0];
        const body = JSON.parse(message.body);
        const idempotencyKey = body.paymentId;
    
        if (!idempotencyKey) {
            throw new Error('Missing idempotency key: paymentId');
        }
    
        const now = Math.floor(Date.now() / 1000);
        const expiry = now + IDEMPOTENCY_TTL_MINUTES * 60;
    
        // === 1. The "Record" Phase: Attempt to claim the idempotency key ===
        try {
            const putCommand = new PutCommand({
                TableName: IDEMPOTENCY_TABLE,
                Item: {
                    id: idempotencyKey,
                    status: 'IN_PROGRESS',
                    expiry: expiry,
                },
                ConditionExpression: 'attribute_not_exists(id)',
            });
            await ddbDocClient.send(putCommand);
            console.log(`Successfully acquired lock for key: ${idempotencyKey}`);
    
        } catch (error) {
            if (error instanceof ConditionalCheckFailedException) {
                console.log(`Potential duplicate request for key: ${idempotencyKey}. Checking status...`);
                
                // === 2. Handle The Race Condition / Duplicate Request ===
                const getCommand = new GetCommand({
                    TableName: IDEMPOTENCY_TABLE,
                    Key: { id: idempotencyKey },
                });
                const { Item } = await ddbDocClient.send(getCommand);
    
                const record = Item as IdempotencyRecord | undefined;
    
                if (!record) {
                    // This is a rare edge case. The item was deleted between our Put and Get.
                    // We can fail fast and let SQS retry, which will then succeed the Put.
                    throw new Error(`Record for key ${idempotencyKey} disappeared. Retrying is safe.`);
                }
    
                if (record.status === 'COMPLETED') {
                    console.log(`Request already completed. Returning stored response for key: ${idempotencyKey}`);
                    return JSON.parse(record.responseData);
                }
    
                if (record.status === 'IN_PROGRESS') {
                    // Another invocation is currently processing this key.
                    // We should fail fast to avoid duplicate processing and let SQS visibility timeout handle retry.
                    throw new Error(`Concurrent execution detected for key: ${idempotencyKey}`);
                }
                
            } else {
                // Some other DynamoDB error occurred
                console.error('DynamoDB error during record phase:', error);
                throw error;
            }
        }
    
        // === 3. The "Execute" Phase: Run the core business logic ===
        let result;
        try {
            result = await processPayment(body.amount, body.currency);
        } catch (businessError) {
            // If business logic fails, we could delete the idempotency record to allow a clean retry.
            // Or, we could mark it as FAILED. For simplicity, we'll let it expire via TTL.
            // A more robust implementation might delete it immediately.
            console.error(`Business logic failed for key ${idempotencyKey}:`, businessError);
            throw businessError; // Propagate error to Lambda for SQS retry
        }
    
        // === 4. The "Update" Phase: Store the result ===
        try {
            const updateCommand = new UpdateCommand({
                TableName: IDEMPOTENCY_TABLE,
                Key: { id: idempotencyKey },
                UpdateExpression: 'SET #status = :status, #responseData = :responseData',
                ExpressionAttributeNames: {
                    '#status': 'status',
                    '#responseData': 'responseData',
                },
                ExpressionAttributeValues: {
                    ':status': 'COMPLETED',
                    ':responseData': JSON.stringify(result),
                },
            });
            await ddbDocClient.send(updateCommand);
            console.log(`Successfully completed and stored result for key: ${idempotencyKey}`);
    
        } catch (updateError) {
            // This is a critical failure state. The business logic succeeded, but we failed to record it.
            // The next invocation will see 'IN_PROGRESS' and fail until the record expires.
            // This is a trade-off. It prevents double-execution at the cost of delayed processing.
            console.error(`CRITICAL: Failed to update idempotency record for key ${idempotencyKey} after successful execution.`, updateError);
            throw updateError;
        }
    
        return result;
    };

    Dissecting the Edge Cases and Concurrency Handling

    The naive implementation is simple, but production reliability lives in the edge cases. Let's analyze the critical try...catch block.

    The ConditionalCheckFailedException is Your Best Friend:

    When two Lambda invocations with the same idempotencyKey execute concurrently, they will both attempt the initial PutCommand with ConditionExpression: 'attribute_not_exists(id)'. DynamoDB guarantees that only one of these requests will succeed. The other will immediately fail with ConditionalCheckFailedException. This is the atomic lock that prevents the race condition.

    When the Lock Fails, What Next?

    The invocation that failed the check knows another process has claimed the key. It must now determine the state of that other process:

  • GetItem: It performs a GetItem to read the record.
  • status === 'COMPLETED': This is the happy path for a duplicate. The first invocation finished successfully. We can confidently return the responseData and terminate, achieving idempotency.
  • status === 'IN_PROGRESS': This is the tricky part. It means another invocation is actively working on this request. The safest action is to fail fast. Throwing an error causes the current Lambda invocation to fail. If the trigger is SQS, the message will become visible again after its visibility timeout and will be retried later. This prevents two functions from running the same business logic simultaneously. By the time the message is retried, the first invocation will likely have completed and set the status to COMPLETED.
  • The Record Disappeared: In a rare scenario, the TTL could have deleted the item between the failed PutItem and the successful GetItem. In this case, throwing an error to trigger a retry is safe, as the next attempt's PutItem will succeed.
  • Handling Partial Failures:

  • Business Logic Failure: If processPayment fails, the Lambda exits. The IN_PROGRESS record remains in DynamoDB. It will eventually be deleted by TTL. A subsequent retry (e.g., from SQS) will attempt the PutItem again after the TTL expiry and succeed, allowing a clean retry of the business logic.
  • Post-Execution Failure: The most problematic state is if processPayment succeeds, but the final UpdateCommand to set the status to COMPLETED fails (e.g., due to a transient DynamoDB issue). The system is now in a state where the action was performed, but not recorded as such. The IN_PROGRESS record will persist until it expires. Any retries during this window will see the IN_PROGRESS status and fail fast. This design correctly prioritizes preventing duplicate execution over immediate processing. The operation will be delayed until the TTL expires, at which point a retry will see the operation as COMPLETED (because it will re-read from the source system, which should reflect the completed payment). This highlights the importance of setting a reasonable expiry.
  • Refactoring into a Reusable Idempotency Utility

    Embedding this logic directly in every Lambda handler is verbose and error-prone. A much cleaner approach is to abstract it into a higher-order function or a TypeScript decorator. This is how production-grade libraries like AWS Lambda Powertools operate.

    Let's build a makeIdempotent higher-order function.

    typescript
    // lib/idempotency.ts
    import { DynamoDBDocumentClient, GetCommand, PutCommand, UpdateCommand } from '@aws-sdk/lib-dynamodb';
    import { ConditionalCheckFailedException } from '@aws-sdk/client-dynamodb';
    
    type IdempotencyStatus = 'IN_PROGRESS' | 'COMPLETED';
    
    interface IdempotencyRecord {
        id: string;
        status: IdempotencyStatus;
        expiry: number;
        responseData?: string; // Always store as a string
    }
    
    export interface MakeIdempotentOptions<TEvent, TResult> {
        ddbClient: DynamoDBDocumentClient;
        tableName: string;
        keyExtractor: (event: TEvent) => string;
        ttlMinutes?: number;
    }
    
    export function makeIdempotent<TEvent, TResult>(
        options: MakeIdempotentOptions<TEvent, TResult>
    ) {
        return function (handler: (event: TEvent) => Promise<TResult>) {
            return async function (event: TEvent): Promise<TResult> {
                const { ddbClient, tableName, keyExtractor, ttlMinutes = 15 } = options;
    
                const idempotencyKey = keyExtractor(event);
                if (!idempotencyKey) {
                    throw new Error('Idempotency key could not be extracted from the event.');
                }
    
                const now = Math.floor(Date.now() / 1000);
                const expiry = now + ttlMinutes * 60;
    
                // 1. Record Phase
                try {
                    await ddbClient.send(new PutCommand({
                        TableName: tableName,
                        Item: { id: idempotencyKey, status: 'IN_PROGRESS', expiry },
                        ConditionExpression: 'attribute_not_exists(id)',
                    }));
                } catch (error) {
                    if (error instanceof ConditionalCheckFailedException) {
                        // 2. Handle Duplicate
                        const { Item } = await ddbClient.send(new GetCommand({
                            TableName: tableName,
                            Key: { id: idempotencyKey },
                        }));
    
                        const record = Item as IdempotencyRecord | undefined;
    
                        if (!record || record.status === 'IN_PROGRESS') {
                            const message = record ? `Concurrent execution detected` : `Record disappeared`;
                            throw new Error(`${message} for key: ${idempotencyKey}. Failing fast.`);
                        }
    
                        if (record.status === 'COMPLETED') {
                            return JSON.parse(record.responseData!);
                        }
                    } else {
                        console.error('DynamoDB error during record phase:', error);
                        throw error;
                    }
                }
    
                // 3. Execute Phase
                let result: TResult;
                try {
                    result = await handler(event);
                } catch (businessError) {
                    // Let it expire via TTL to allow for retries
                    console.error(`Business logic failed for key ${idempotencyKey}:`, businessError);
                    throw businessError;
                }
    
                // 4. Update Phase
                try {
                    await ddbClient.send(new UpdateCommand({
                        TableName: tableName,
                        Key: { id: idempotencyKey },
                        UpdateExpression: 'SET #status = :status, #responseData = :responseData',
                        ExpressionAttributeNames: {
                            '#status': 'status',
                            '#responseData': 'responseData',
                        },
                        ExpressionAttributeValues: {
                            ':status': 'COMPLETED',
                            ':responseData': JSON.stringify(result),
                        },
                    }));
                } catch (updateError) {
                    console.error(`CRITICAL: Failed to update idempotency record for key ${idempotencyKey}`, updateError);
                    throw updateError;
                }
    
                return result;
            };
        };
    }

    Now, our Lambda handler becomes beautifully clean and focused only on the business logic:

    typescript
    // new-handler.ts
    import { SQSEvent } from 'aws-lambda';
    import { ddbDocClient, processPayment } from './lib/dependencies';
    import { makeIdempotent } from './lib/idempotency';
    
    // The core business logic is now separate and pure
    const paymentHandlerLogic = async (event: SQSEvent) => {
        const message = event.Records[0];
        const body = JSON.parse(message.body);
        return await processPayment(body.amount, body.currency);
    };
    
    // Wrap the core logic with the idempotency utility
    export const handler = makeIdempotent({
        ddbClient: ddbDocClient,
        tableName: process.env.IDEMPOTENCY_TABLE!,
        keyExtractor: (event: SQSEvent) => JSON.parse(event.Records[0].body).paymentId,
        ttlMinutes: 60, // Set a longer TTL for critical payment processing
    })(paymentHandlerLogic);

    This abstraction provides consistency, reduces boilerplate, and makes the system easier to reason about and test.

    Performance and Cost Considerations

    While robust, this pattern is not free. It's crucial to understand the trade-offs.

  • Latency Overhead: Each idempotent invocation adds at least one DynamoDB PutItem call. In the case of a duplicate, it adds a GetItem call as well. With DynamoDB's single-digit millisecond latency, this overhead is typically negligible (e.g., 5-15ms), but it should be measured and considered for ultra-low-latency applications.
  • DynamoDB Cost:
  • * Capacity Mode: For most event-driven workloads with spiky traffic, On-Demand capacity is the most cost-effective and operationally simple choice. You pay per read/write request unit.

    * Write/Read Units: The happy path consumes 1 Write Request Unit (WRU) for the initial PutItem and 1 WRU for the final UpdateItem. A duplicate request consumes 1 failed WRU and 0.5 Read Request Units (RRU) for the GetItem. The costs are minimal at scale but not zero.

  • Item Size and responseData: DynamoDB has a 400 KB limit per item. Storing large response payloads directly in the responseData attribute can be problematic.
  • * Performance: Larger items consume more WCUs/RCUs and increase latency.

    * Solution: For large payloads, store a reference instead. For example, save the result to an S3 bucket and store the S3 object key (s3://bucket/results/key) in the responseData attribute. The downstream consumer can then fetch the full result from S3 if needed.

  • TTL Deletions: DynamoDB's TTL feature is free. However, the deletion process is not instantaneous. Items are typically deleted within 48 hours of their expiry time. This means your IN_PROGRESS checks must rely on the expiry timestamp itself, not just the existence of the record, to handle zombie records from crashed invocations.
  • Conclusion: Building for Reality

    Achieving effectively-once processing in a distributed serverless architecture is a non-negotiable requirement for any critical business operation. While infrastructure provides at-least-once guarantees, the responsibility for idempotency rests firmly within the application layer.

    The pattern of using an idempotency key with DynamoDB's conditional writes and TTL provides a robust, scalable, and cost-effective solution. By handling the states (IN_PROGRESS, COMPLETED), managing concurrency through atomic operations, and planning for partial failures, you can build systems that are resilient to the inherent unpredictability of distributed environments.

    While this post demonstrates how to build this mechanism from scratch to understand its core principles, for new production projects, I strongly recommend leveraging a battle-tested library like AWS Lambda Powertools (for TypeScript, Python, or Java). It implements this same underlying pattern with additional features like Jmespath for key extraction and robust error handling, allowing you to focus on what truly matters: your business logic.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles