Idempotency-Key Patterns for Exactly-Once API Execution in Distributed Systems

October 9, 2025

25 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Pragmatic Pursuit of Exactly-Once Processing

In the world of distributed systems, "exactly-once delivery" is a theoretical fallacy. Network partitions, client-side timeouts, and non-atomic state changes conspire to make it impossible to guarantee a message is delivered and processed precisely one time. However, for critical operations like payment processing, order fulfillment, or any state-changing API call, the effect of exactly-once processing is non-negotiable. A duplicate API call could result in a customer being double-charged or an order being shipped twice—catastrophic failures in production.

The industry-standard solution is to shift the responsibility from the network transport to the application layer by implementing idempotency. By providing a unique Idempotency-Key in the request header, a client can safely retry a request multiple times, confident that the server-side operation will be performed only once. The first successful request's result is cached and returned for all subsequent retries with the same key.

This article is not an introduction to the concept. We assume you understand why idempotency is necessary. Instead, we will dissect the advanced implementation details, trade-offs, and failure modes encountered when building a production-grade idempotency layer for a high-throughput, distributed API.

We will cover:

The Core Request Flow: Anatomy of a stateful, idempotent request lifecycle.

Storage Layer Deep Dive: Comparing and implementing PostgreSQL and Redis for storing idempotency states, focusing on atomicity and performance.

Concurrency and Race Condition Mitigation: Advanced locking patterns to handle simultaneous requests for the same key.

Key Lifecycle and Garbage Collection: Strategies for managing key expiration and storage bloat.

Critical Edge Cases: Handling partial failures, mismatched payloads, and complex error states.

1. Anatomy of an Idempotency-Key Request Flow

At its core, an idempotency layer is a state machine that tracks an operation's progress. The state is uniquely identified by the Idempotency-Key provided by the client. A robust implementation must gracefully handle three primary states for a given key: unseen, processing, and completed.

Let's model the server-side logic, typically implemented as middleware that intercepts incoming requests.

Client-Side Responsibility: The client must generate a unique key for each distinct operation. A UUIDv4 is a common choice for its high collision resistance. For operations that can be naturally deduplicated by content, a hash of the request payload can also be used, but this is less flexible. The key should be passed in a standard header, e.g., Idempotency-Key: 2e532b4b-221a-4577-803f-4203a35835f8.

Server-Side State Machine Flow:

Extraction: The middleware extracts the Idempotency-Key from the request headers. If absent, the request bypasses the idempotency logic and is processed normally.

State Lookup: The server performs an atomic lookup in a persistent storage layer (e.g., a database or a durable cache) for the given key.

State Handling:

* Case 1: Key Not Found (Unseen)

* This is the first time we've seen this operation.

* Atomically create a record for the key and mark its status as processing. This acts as a lock to prevent concurrent executions.

* Proceed to execute the core business logic (e.g., charge a credit card, create an order).

* On Success: Atomically update the key's record. Set the status to completed, and store the HTTP status code and response body.

* On Failure: The handling here is nuanced. If the failure is deterministic (e.g., validation error), you might store the failure response. If it's a transient server error, you might delete the key entirely to allow a clean retry. We'll explore this in the edge cases section.

* Return the generated response to the client.

* Case 2: Key Found with processing Status

* Another request for the same operation is already in flight.

* This indicates a race condition, likely due to aggressive client retries or network latency.

* Do not execute the business logic again. Immediately return an error to the client. A 409 Conflict or 429 Too Many Requests status code is appropriate, signaling that the client should wait and potentially retry later.

* Case 3: Key Found with completed Status

* The operation has already been successfully executed.

* Do not execute the business logic again.

* Retrieve the stored HTTP status code and response body from the record.

* Return the exact same response that was sent for the original request.

This flow ensures that no matter how many times a client retries a request with the same key, the underlying business logic is executed at most once.

python

# High-level conceptual middleware in Python/FastAPI

@app.middleware("http")
async def idempotency_middleware(request: Request, call_next):
    if request.method not in ("POST", "PUT", "PATCH"):
        return await call_next(request)

    idempotency_key = request.headers.get("Idempotency-Key")
    if not idempotency_key:
        return await call_next(request)

    # 1. State Lookup
    stored_state = await storage.get_key(idempotency_key)

    if stored_state:
        # 2. Key Found (Completed or Processing)
        if stored_state.status == "completed":
            return Response(
                content=stored_state.response_body,
                status_code=stored_state.response_code,
                headers={"Content-Type": "application/json"}
            )
        elif stored_state.status == "processing":
            return JSONResponse(
                status_code=409, 
                content={"error": "Request with this key is already being processed"}
            )
    
    # 3. Key Not Found (Unseen)
    await storage.create_key(idempotency_key, status="processing")

    try:
        # Execute actual business logic
        response = await call_next(request)
        
        # On success, read body and store result
        response_body = b''
        async for chunk in response.body_iterator:
            response_body += chunk
        
        await storage.update_key(
            idempotency_key,
            status="completed",
            response_code=response.status_code,
            response_body=response_body.decode()
        )
        # Re-create the response to be returned as the body was consumed
        return Response(
            content=response_body,
            status_code=response.status_code,
            media_type=response.media_type,
            headers=response.headers
        )
    except Exception as e:
        # On failure, remove the lock to allow retries
        await storage.delete_key(idempotency_key)
        # Re-raise the exception to be handled by other middleware
        raise e

This conceptual code glosses over the most critical detail: the atomicity of the storage operations. Let's dive into how to implement this correctly.

2. Deep Dive into the Storage Layer

The correctness of an idempotency system hinges entirely on the atomicity guarantees of its storage layer. A race condition between reading and writing the key's state can violate the exactly-once principle. We'll analyze two popular choices: PostgreSQL and Redis.

Option A: PostgreSQL for Durability and Consistency

A relational database is an excellent choice when durability is paramount. Financial systems often prefer this route.

Schema Design

We need a table to store the state of each idempotency key.

sql

CREATE TYPE idempotency_status AS ENUM ('processing', 'completed', 'failed');

CREATE TABLE idempotency_keys (
    key VARCHAR(255) PRIMARY KEY,
    -- The user/tenant this key belongs to, crucial for multi-tenant systems
    user_id UUID NOT NULL,
    
    -- State machine fields
    status idempotency_status NOT NULL,
    
    -- Locked until this timestamp to prevent stuck 'processing' states
    locked_until TIMESTAMPTZ,
    
    -- Stored response
    response_code SMALLINT,
    response_body JSONB,
    
    -- Timestamps for lifecycle management
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Index for fast lookups
CREATE INDEX idx_idempotency_keys_user_id_key ON idempotency_keys (user_id, key);
-- Index for garbage collection
CREATE INDEX idx_idempotency_keys_created_at ON idempotency_keys (created_at);

Atomic Operations: The Core Pattern

The most challenging part is the initial check-and-set operation. A naive SELECT followed by an INSERT creates a classic race condition. Two concurrent requests might both see the key as non-existent and both attempt to insert.

The solution is to use PostgreSQL's INSERT ... ON CONFLICT DO NOTHING feature, which provides an atomic "insert if not exists" primitive.

Here's the production-grade flow within a single transaction:

Attempt Atomic Insert: Try to insert a new key with a processing status. The PRIMARY KEY constraint on the key column will cause a conflict if the key already exists.

sql

    INSERT INTO idempotency_keys (key, user_id, status, locked_until)
    VALUES ($1, $2, 'processing', NOW() + INTERVAL '30 seconds')
    ON CONFLICT (key) DO NOTHING;

Check Insert Result: The INSERT command returns the number of rows affected. If it's 1, we successfully acquired the lock. We are the first request and can proceed with the business logic.

Handle Conflict: If the number of affected rows is 0, the key already existed. We must now SELECT the existing record to determine its status (processing or completed) and act accordingly.

Complete Python/SQLAlchemy 2.0 Implementation:

python

from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy import text
import datetime

class PostgresIdempotencyStorage:
    def __init__(self, session: AsyncSession):
        self.session = session

    async def get_or_lock(self, key: str, user_id: str):
        # Step 1: Attempt atomic insert
        lock_interval = datetime.timedelta(seconds=30)
        insert_stmt = text(
            """INSERT INTO idempotency_keys (key, user_id, status, locked_until)
               VALUES (:key, :user_id, 'processing', NOW() + :interval)
               ON CONFLICT (key) DO NOTHING"""
        )
        result = await self.session.execute(
            insert_stmt, 
            {"key": key, "user_id": user_id, "interval": lock_interval}
        )

        if result.rowcount == 1:
            # We got the lock
            await self.session.commit()
            return {"status": "first_request"}

        # Step 2: Key existed, so we select it.
        # We can use a pessimistic lock here for extra safety, discussed later.
        select_stmt = text(
            """SELECT status, response_code, response_body 
               FROM idempotency_keys WHERE key = :key AND user_id = :user_id"""
        )
        record = await self.session.execute(select_stmt, {"key": key, "user_id": user_id})
        existing = record.first()

        if not existing:
            # This is a rare but possible race condition if the key was just deleted.
            # We can either retry the whole operation or return an error.
            return {"status": "conflict", "error": "Concurrent modification detected"}

        return {
            "status": existing.status,
            "response_code": existing.response_code,
            "response_body": existing.response_body
        }

    async def save_response(self, key: str, user_id: str, code: int, body: dict):
        update_stmt = text(
            """UPDATE idempotency_keys 
               SET status = 'completed', response_code = :code, response_body = :body, updated_at = NOW()
               WHERE key = :key AND user_id = :user_id AND status = 'processing'"""
        )
        await self.session.execute(update_stmt, {"key": key, "user_id": user_id, "code": code, "body": body})
        await self.session.commit()

This pattern is highly robust. The ON CONFLICT clause leverages the database's internal locking mechanisms to ensure only one transaction can create the initial record.

Option B: Redis for Low-Latency

For applications where request latency is more critical than absolute durability (e.g., non-financial transactions), Redis can be a faster alternative.

Data Model:

We can store the state of a key as a JSON string or a Redis Hash. A simple string is often sufficient.

* Key: idempotency:

* Value: {"status": "processing"} or {"status": "completed", "code": 200, "body": "..."}

Atomic Operations: SETNX

The Redis command SET key value NX (SET if Not eXists) is the cornerstone of this approach. It's an atomic operation that sets a key only if it doesn't already exist.

Attempt Lock: Use SETNX to create the key with a processing state and a Time-To-Live (TTL) to prevent permanent locks.

python

    import redis.asyncio as redis
    import json

    # Example using redis-py
    r = redis.Redis(...)
    
    # The lock value can be simple, or contain more info
    processing_state = json.dumps({"status": "processing"})
    
    # SETNX with a 30-second expiry (EX)
    lock_acquired = await r.set("idempotency:some-key", processing_state, ex=30, nx=True)

Check Result: If lock_acquired is True, we have the lock and can proceed with the business logic.

Handle Conflict: If lock_acquired is False, the key exists. We must GET the key to check its content.

Complete Python/Redis Implementation:

python

import redis.asyncio as redis
import json

class RedisIdempotencyStorage:
    def __init__(self, client: redis.Redis):
        self.client = client
        self.lock_ttl = 30  # seconds
        self.result_ttl = 86400 # 24 hours

    def _key(self, key: str):
        return f"idempotency:{key}"

    async def get_or_lock(self, key: str):
        redis_key = self._key(key)
        processing_state = json.dumps({"status": "processing"})

        # Atomically set the key if it doesn't exist, with a lock TTL
        if await self.client.set(redis_key, processing_state, ex=self.lock_ttl, nx=True):
            return {"status": "first_request"}
        
        # Key exists, get its value
        stored_data = await self.client.get(redis_key)
        if not stored_data:
            # Key expired between our SETNX and GET. A rare race.
            # Client can retry the whole operation.
            return {"status": "conflict", "error": "Lock expired concurrently"}
        
        return json.loads(stored_data)

    async def save_response(self, key: str, code: int, body: dict):
        redis_key = self._key(key)
        completed_state = json.dumps({
            "status": "completed",
            "code": code,
            "body": body
        })
        # Set the final result with a longer TTL
        await self.client.set(redis_key, completed_state, ex=self.result_ttl)

Durability Trade-off: The primary risk with Redis is a crash. If Redis is configured with default persistence (RDB snapshots), a crash could lose the record of a completed transaction. If a client retries after the crash, the operation will be executed again. Using AOF (Append-Only File) persistence provides much stronger durability guarantees, making it a better choice for this use case, albeit with a performance penalty.

3. Handling Concurrency and Race Conditions

The atomic INSERT...ON CONFLICT or SETNX patterns handle the most common race condition. However, more subtle concurrency issues can arise.

The Thundering Herd Problem

Imagine a client with a bug or an aggressive retry policy that sends 10 identical requests simultaneously. The first request will acquire the lock. The other 9 will see the processing state. This is correct, but it generates a burst of 409 Conflict errors and puts unnecessary load on the idempotency layer.

While this is technically correct behavior, in some systems, it might be desirable to use a more robust locking mechanism.

Pattern 1: Pessimistic Locking with `SELECT ... FOR UPDATE`

In the PostgreSQL implementation, when a request finds an existing key, it could be because another transaction is actively processing it. To prevent reading a stale processing state just before the other transaction commits, we can use a pessimistic lock.

sql

-- When handling a conflict, lock the row before reading it.
BEGIN;

SELECT status, response_code, response_body 
FROM idempotency_keys 
WHERE key = :key AND user_id = :user_id
FOR UPDATE;

-- ... logic to handle the returned status ...

COMMIT;

FOR UPDATE tells PostgreSQL to lock the selected row. Any other transaction trying to SELECT ... FOR UPDATE or UPDATE the same row will be blocked until our current transaction completes. This provides the strongest consistency guarantee but comes at a cost:

* Performance: It holds a database lock for the duration of the read, which can reduce throughput under high contention.

* Deadlocks: Care must be taken to avoid deadlocks if multiple resources are being locked.

This pattern is often overkill but can be valuable in systems where the cost of a concurrency anomaly is extremely high.

Pattern 2: Optimistic Locking with a Version/Status Check

The flow we designed is already a form of optimistic locking. We assume conflicts are rare. The crucial part is the final UPDATE operation, which must be conditional.

sql

UPDATE idempotency_keys 
SET status = 'completed', ...
WHERE key = :key AND status = 'processing'; -- This condition is critical!

This WHERE clause ensures that we only transition a key to completed if it's still in the processing state we expected. If, for some reason, its state changed (e.g., a timeout mechanism marked it as failed), this update will do nothing, preventing inconsistent state transitions.

4. The Idempotency Key Lifecycle and Garbage Collection

Storing every idempotency key forever is not feasible. This would lead to unbounded storage growth.

Defining a TTL (Time-To-Live)

Every key must have a defined lifecycle. A common industry standard, used by services like Stripe, is 24 hours. This duration is a trade-off:

* Too Short: A client experiencing a prolonged network outage might retry a request after the key has been purged, leading to a duplicate execution.

* Too Long: Increases storage costs and can slow down lookups if the table/keyspace becomes bloated.

A 24-hour window is generally considered a safe balance, as it's unlikely for a legitimate client retry to be delayed for longer.

Implementation Strategies

* For Redis: This is trivial. The EXPIRE or EX option on the SET command handles this automatically. We used this in our example, setting a short TTL for the processing lock and a longer TTL for the completed result.

* For PostgreSQL: We need an active garbage collection process.

1. Timestamp-based Deletion: The created_at column in our schema is designed for this.

2. Background Job: A scheduled job (e.g., a cron job, a Celery task, or a Kubernetes CronJob) runs periodically (e.g., once an hour) and executes a simple DELETE statement.

sql

    DELETE FROM idempotency_keys
    WHERE created_at < NOW() - INTERVAL '24 hours';

The idx_idempotency_keys_created_at index is crucial for ensuring this deletion process is efficient and doesn't lock the table for an extended period.

5. Advanced Scenarios and Edge Cases

The real test of an idempotency system is how it handles failures and unexpected client behavior.

Case 1: Partial Failures

The Problem: The business logic (e.g., a database COMMIT) succeeds, but the server crashes before it can update the idempotency key's status to completed. The key is now stuck in the processing state.

Solutions:

Single Transaction Boundary: If the business logic and the idempotency store are in the same database (e.g., both in PostgreSQL), wrap the entire operation in a single transaction. The business logic COMMIT and the idempotency key UPDATE will succeed or fail together, atomically.

python

    # In a single SQLAlchemy transaction block
    async with session.begin():
        # 1. Acquire lock (INSERT ... ON CONFLICT)
        # ...
        # 2. Execute business logic (e.g., create an order row)
        # ...
        # 3. Update idempotency key to 'completed'
        # ...
    # The 'async with' block ensures all operations are committed together.

Lock Timeout (locked_until): This is why our PostgreSQL schema included a locked_until field. When a request encounters a processing key, it should also check this timestamp. If NOW() > locked_until, it can assume the original worker died and safely take over the lock. The new worker can then attempt to perform the business logic. This requires careful handling to determine if the previous operation partially succeeded.

Recovery/Reconciliation: In highly critical systems, a background process may be needed to scan for old processing keys and query other parts of the system to determine the true state of the operation before marking the key as completed or failed.

Case 2: Mismatched Request Payloads

The Problem: A client sends a POST /payments request with Idempotency-Key: A and { amount: 100 }. It succeeds. Moments later, due to a bug, it sends another request with the same Idempotency-Key: A but a different payload: { amount: 200 }.

The Solution: The server should reject the second request. An idempotency key guarantees the same operation is performed once. A different payload constitutes a different operation.

Implementation:

Store a hash of the request's critical parameters alongside the key. When a completed key is found, re-hash the incoming request and compare it to the stored hash. If they don't match, return a 422 Unprocessable Entity or a similar client error.

Modified PostgreSQL Schema:

sql

ALTER TABLE idempotency_keys ADD COLUMN request_hash VARCHAR(64);
-- Add to the primary key to allow different payloads for the same logical operation if desired
-- ALTER TABLE idempotency_keys DROP CONSTRAINT idempotency_keys_pkey, ADD PRIMARY KEY (key, request_hash);

Middleware Logic:

python

import hashlib
import json

async def get_request_hash(request: Request) -> str:
    body = await request.body()
    # Ensure consistent hashing by sorting keys in JSON
    # Note: This is a simplified example. A robust implementation would need to
    # handle more content types and normalize the request signature.
    if request.headers.get('content-type') == 'application/json':
        try:
            parsed_body = json.loads(body)
            canonical_body = json.dumps(parsed_body, sort_keys=True).encode('utf-8')
            return hashlib.sha256(canonical_body).hexdigest()
        except json.JSONDecodeError:
            pass # Fallback to raw body hash
    return hashlib.sha256(body).hexdigest()

# Inside the middleware...
if stored_state.status == "completed":
    incoming_hash = await get_request_hash(request)
    if incoming_hash != stored_state.request_hash:
        return JSONResponse(
            status_code=422,
            content={"error": "Idempotency-Key is being reused with a different request payload."}
        )
    # ... return cached response ...

Case 3: Handling Server-Side Errors

What happens if the business logic fails with a 500 Internal Server Error?

* Option 1 (Recommended): Delete the Key. If the error was transient (e.g., a temporary database deadlock, network blip), deleting the idempotency key record allows the client to perform a clean retry that will be processed as a new request.

* Option 2 (Use with Caution): Store the Failure. If the error is deterministic (e.g., an invalid foreign key that will always fail), you could store a failed status and the 500 response. This prevents clients from endlessly retrying an operation that is doomed to fail. However, this can be dangerous if a transient error is mistakenly classified as permanent.

Generally, it's safer to only cache successful (2xx) responses and treat all failures (4xx, 5xx) as non-idempotent events, allowing for retries by deleting the processing lock.

Conclusion

Implementing a production-grade idempotency layer is a complex but essential task for building reliable APIs in a distributed environment. It's a classic example of where senior engineering diligence pays dividends in system stability and data integrity.

Key takeaways for a robust implementation include:

* Choose the Right Storage: Use a database with strong atomicity guarantees. PostgreSQL's INSERT ... ON CONFLICT is a powerful tool for durable, consistent locking. Redis's SETNX offers higher performance at the cost of durability.

* Master Concurrency: Your initial lock acquisition must be atomic. Understand the trade-offs between simple optimistic checks and more heavyweight pessimistic locking.

* Manage the Lifecycle: Keys cannot live forever. Implement a strict TTL and a reliable garbage collection mechanism to manage storage.

* Anticipate Failure: Plan for partial failures, server crashes, and unexpected client behavior. Validate request payloads on retries and handle server errors gracefully to ensure the client can safely recover.

By moving beyond the basic concept and meticulously engineering these advanced patterns, you can provide the pragmatic guarantee of exactly-once processing that is the bedrock of any fault-tolerant, high-stakes application.

The Pragmatic Pursuit of Exactly-Once Processing

1. Anatomy of an Idempotency-Key Request Flow

2. Deep Dive into the Storage Layer

Option A: PostgreSQL for Durability and Consistency

Option B: Redis for Low-Latency

3. Handling Concurrency and Race Conditions

The Thundering Herd Problem

Pattern 1: Pessimistic Locking with `SELECT ... FOR UPDATE`

Pattern 2: Optimistic Locking with a Version/Status Check

4. The Idempotency Key Lifecycle and Garbage Collection

5. Advanced Scenarios and Edge Cases

Case 1: Partial Failures

Case 2: Mismatched Request Payloads

Case 3: Handling Server-Side Errors

Conclusion

Found this article helpful?