Idempotency Layers in Async Services with Redis & Lua

October 15, 2025

17 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitability of Duplicates in Distributed Systems

In any non-trivial distributed system, particularly those employing message queues or asynchronous HTTP retries, the contract is almost always at-least-once delivery. The network is unreliable, services crash, and consumers get redeployed. This reality forces us, as application developers, to confront the problem of duplicate requests. A client retrying a failed payment API call, a Kafka consumer re-processing a message after a crash, or a webhook provider re-sending a notification—all these scenarios can lead to unintended side effects if the underlying operation is not idempotent.

While some operations are naturally idempotent (e.g., PUT /users/123), many critical business operations are not (e.g., POST /payments, POST /orders). Simply wrapping a database transaction around the operation isn't enough. The transaction might succeed, but the response to the client might be lost, triggering a retry that re-executes the already-committed transaction.

The standard solution is an idempotency layer, typically implemented as a middleware that intercepts incoming requests. It uses a unique Idempotency-Key provided by the client to track the status of an operation. A naive implementation might look like this:

Client sends POST /payments with Idempotency-Key: some-unique-value.

Server checks if some-unique-value exists in a cache (like Redis).

If it exists, return the cached response.
If not, process the payment, cache the response, and then return it.

This approach is fundamentally broken due to race conditions. Two identical requests arriving milliseconds apart could both perform step #2, find no key, and proceed to process the payment twice. This article details a robust, production-grade pattern to solve this problem atomically using Redis and Lua.

The Atomic State Machine: Beyond Simple SET NX

A common first attempt to fix the race condition is to use Redis's SET key value NX EX ttl command. The NX option ensures the key is only set if it doesn't already exist, making the initial lock acquisition atomic. However, this is still insufficient for a complete idempotency lifecycle.

What state does the operation have? At a minimum:

IN_PROGRESS: We've received the request and started processing, but haven't finished.

COMPLETED: We've finished processing and have a result to return.

Using a simple string key in Redis can't effectively model this. If a worker acquires a lock (SET idempotency-key IN_PROGRESS NX) and then crashes, the key is stuck in IN_PROGRESS until its TTL expires. More importantly, there's no place to store the final response associated with the key.

A Better Model: Redis Hashes

We can model the entire lifecycle of an idempotent request using a Redis Hash. Each Idempotency-Key maps to a hash with fields like:

status: IN_PROGRESS | COMPLETED

response_code: e.g., 201

response_body: The JSON response body

request_hash: A hash of the request body to prevent key misuse

This structure allows us to differentiate between an operation that is actively being processed and one that has finished. The core challenge remains: how do we check for the key, create it in an IN_PROGRESS state, and retrieve a COMPLETED state—all in a single, atomic operation to prevent race conditions?

This is where Lua scripting in Redis becomes essential.

Atomicity with Lua: The Core of the Solution

Redis allows the execution of server-side Lua scripts. These scripts are executed atomically. No other Redis command will run while a Lua script is executing. This gives us the power to implement our complex locking and checking logic without fear of interruption.

Here is the Lua script that forms the foundation of our idempotency middleware. It handles the initial check when a request first arrives.

lua

-- idempotency_check.lua
-- KEYS[1]: The idempotency key provided by the client
-- ARGV[1]: The hash of the incoming request body
-- ARGV[2]: The TTL for the key in seconds (for the IN_PROGRESS state)

-- Try to get the existing hash for the key
local key_data = redis.call('HGETALL', KEYS[1])

-- Check if the key exists
if #key_data > 0 then
    -- Key exists, convert the array response to a map for easier access
    local response_map = {}
    for i = 1, #key_data, 2 do
        response_map[key_data[i]] = key_data[i+1]
    end

    -- Check if the request hash matches. If not, it's a client error.
    if response_map['request_hash'] ~= ARGV[1] then
        return {'ERROR', 'KEY_REUSE_MISMATCH'}
    end

    -- If the status is 'COMPLETED', return the stored response
    if response_map['status'] == 'COMPLETED' then
        return {'COMPLETED', response_map['response_code'], response_map['response_body']}
    end

    -- If the status is 'IN_PROGRESS', another request is processing it.
    if response_map['status'] == 'IN_PROGRESS' then
        return {'IN_PROGRESS'}
    end
else
    -- Key does not exist. This is a new request.
    -- Create the hash in an 'IN_PROGRESS' state.
    redis.call('HSET', KEYS[1], 'status', 'IN_PROGRESS', 'request_hash', ARGV[1])
    redis.call('EXPIRE', KEYS[1], ARGV[2])
    return {'NEW'}
end

Dissecting the Lua Script

This script returns a tuple-like array that our application code can easily parse:

HGETALL: We first attempt to fetch the entire hash associated with the idempotency key.

Key Exists Path:

* If key_data is not empty, we first validate the request_hash. This is a critical security and correctness check. A client should not be able to reuse an idempotency key for a different operation. If the hashes don't match, we return an error state {'ERROR', 'KEY_REUSE_MISMATCH'}. Our middleware should translate this into a 422 Unprocessable Entity response.

* If the hashes match and the status is COMPLETED, we've found a cached result. We return {'COMPLETED', code, body}, allowing the middleware to immediately serve the cached response.

* If the status is IN_PROGRESS, a concurrent request is being processed. We return {'IN_PROGRESS'}. The middleware should translate this to a 409 Conflict.

Key Does Not Exist Path:

* If key_data is empty, this is the first time we've seen this key.

* We atomically create the hash with HSET, setting the status to IN_PROGRESS and storing the request_hash.

* We set an EXPIRE on the key. This is our safety net. If the worker crashes mid-process, the lock will eventually be released, allowing a retry to proceed.

* We return {'NEW'} to signal that the middleware should proceed to the main business logic.

Production Implementation: Node.js & Express Middleware

Let's translate this logic into a practical, reusable middleware for an Express.js application. We'll use the ioredis library for Redis communication and crypto for hashing.

Project Setup

bash

npm init -y
npm install express ioredis crypto

The Idempotency Middleware

Here is the complete implementation. Note the two-phase nature: a pre-handler check and a post-handler result storage.

javascript

// idempotencyMiddleware.js
const Redis = require('ioredis');
const crypto = require('crypto');
const fs = require('fs');
const path = require('path');

// --- Configuration ---
const redisClient = new Redis({
    host: 'localhost',
    port: 6379,
    // Add other production settings: password, tls, etc.
});

// TTL for in-progress keys (in seconds). This is our crash guard.
const IN_PROGRESS_TTL = 300; // 5 minutes
// TTL for completed keys (in seconds). How long to cache results.
const COMPLETED_TTL = 86400; // 24 hours

// --- Lua Script Loading ---
// It's much more efficient to load the script once and use EVALSHA
const luaScript = fs.readFileSync(path.join(__dirname, 'idempotency_check.lua'), 'utf8');
let scriptSha;

async function loadLuaScript() {
    try {
        scriptSha = await redisClient.script('load', luaScript);
        console.log('Idempotency Lua script loaded successfully. SHA:', scriptSha);
    } catch (err) {
        console.error('Failed to load Lua script into Redis', err);
        process.exit(1);
    }
}

// --- Helper Functions ---
function getRequestHash(req) {
    // A stable stringification is important. Sorting keys is a good practice.
    const sortedBody = JSON.stringify(Object.keys(req.body).sort().reduce(
        (obj, key) => { 
            obj[key] = req.body[key]; 
            return obj; 
        }, 
        {}
    ));
    return crypto.createHash('sha256').update(sortedBody).digest('hex');
}

// --- The Middleware ---
async function idempotencyMiddleware(req, res, next) {
    const idempotencyKey = req.headers['idempotency-key'];

    if (!idempotencyKey) {
        // For endpoints requiring idempotency, you might choose to fail fast.
        // Or, you could bypass the middleware if the key is optional.
        return next(); 
    }

    if (!scriptSha) {
        console.error('Lua script not loaded. Cannot process idempotent request.');
        return res.status(500).json({ error: 'Server configuration error' });
    }

    const requestHash = getRequestHash(req);

    try {
        const result = await redisClient.evalsha(scriptSha, 1, idempotencyKey, requestHash, IN_PROGRESS_TTL);

        const state = result[0];

        if (state === 'NEW') {
            // This is a new request. Proceed to the handler.
            // We need to intercept the response to store it.
            const originalSend = res.send;
            res.send = function (body) {
                // Only cache successful responses (e.g., 2xx)
                if (res.statusCode >= 200 && res.statusCode < 300) {
                    const responseBody = typeof body === 'string' ? body : JSON.stringify(body);
                    
                    // Use a MULTI/EXEC pipeline to set the final state and TTL atomically
                    redisClient.multi()
                        .hset(idempotencyKey, 'status', 'COMPLETED', 'response_code', res.statusCode, 'response_body', responseBody)
                        .expire(idempotencyKey, COMPLETED_TTL)
                        .exec();
                }
                originalSend.apply(res, arguments);
            };
            return next();
        } else if (state === 'COMPLETED') {
            const [, statusCode, body] = result;
            res.status(parseInt(statusCode, 10)).header('Content-Type', 'application/json').send(body);
            return;
        } else if (state === 'IN_PROGRESS') {
            return res.status(409).json({ error: 'Request with this idempotency key is already in progress.' });
        } else if (state === 'ERROR' && result[1] === 'KEY_REUSE_MISMATCH') {
            return res.status(422).json({ error: 'Idempotency key reused with a different request body.' });
        }

    } catch (err) {
        console.error('Idempotency middleware Redis error:', err);
        // Fail open or closed? Failing open risks duplicate processing. 
        // Failing closed (500 error) is safer for non-idempotent operations.
        return res.status(500).json({ error: 'Could not process idempotency key.' });
    }
}

module.exports = { idempotencyMiddleware, loadLuaScript };

Integrating with an Express App

javascript

// server.js
const express = require('express');
const { idempotencyMiddleware, loadLuaScript } = require('./idempotencyMiddleware');

const app = express();
app.use(express.json());

// A mock database
const payments = {};

// This route is protected by the idempotency middleware
app.post('/payments', idempotencyMiddleware, async (req, res) => {
    const { amount, currency, destination } = req.body;
    const paymentId = `pid_${Date.now()}`;

    // Simulate a slow, non-idempotent operation
    console.log(`Processing payment ${paymentId} for ${amount} ${currency}...`);
    await new Promise(resolve => setTimeout(resolve, 2000)); // Simulate DB call, external API, etc.

    const newPayment = { id: paymentId, amount, currency, destination, status: 'SUCCESS' };
    payments[paymentId] = newPayment;

    console.log(`Payment ${paymentId} successful.`);
    res.status(201).json(newPayment);
});

app.get('/payments', (req, res) => {
    res.json(Object.values(payments));
});

const PORT = 3000;

// Load the Lua script before starting the server
loadLuaScript().then(() => {
    app.listen(PORT, () => {
        console.log(`Server running on http://localhost:${PORT}`);
    });
});

Testing the Scenarios

You can test the different paths using curl:

1. First Successful Request:

bash

curl -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: test-key-123" \
-d '{"amount": 100, "currency": "USD", "destination": "acct_123"}'

# Server logs: Processing payment...
# Server logs: Payment successful.
# Response: (201 Created) { "id": "...", "amount": 100, ... }

2. Immediate Retry (while first is processing):

Open a second terminal and run the same command immediately. You will get:

bash

# Response: (409 Conflict) { "error": "Request with this idempotency key is already in progress." }

3. Retry After Completion:

Run the command again after the first one has finished.

bash

# Server logs: (nothing, the handler is not hit)
# Response: (201 Created) { "id": "...", "amount": 100, ... } (This is the cached response)

4. Key Reuse with Different Body:

Change the request body but use the same key.

bash

curl -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: test-key-123" \
-d '{"amount": 999, "currency": "USD", "destination": "acct_123"}'

# Response: (422 Unprocessable Entity) { "error": "Idempotency key reused with a different request body." }

Advanced Edge Cases and Performance Considerations

This implementation is robust, but in a real-world, high-throughput system, there are further considerations.

Performance Overhead

Latency: Every protected request incurs at least one Redis roundtrip. In a typical cloud environment (e.g., EC2 to ElastiCache in the same AZ), this is sub-millisecond. However, it's not zero. For hyper-sensitive, low-latency endpoints, you must benchmark this overhead.

EVAL vs. EVALSHA: Our code uses EVALSHA. The first time the server starts, it sends the script to Redis via SCRIPT LOAD and gets a SHA1 hash. Subsequent calls use the much smaller EVALSHA command, sending only the hash. This reduces network bandwidth compared to sending the full script on every request.

Connection Pooling: For high-concurrency services, ensure your Redis client library is configured with an adequate connection pool to avoid connection setup/teardown overhead on each request.

Handling Worker Crashes and TTLs

The IN_PROGRESS_TTL is a double-edged sword.

Too Short: If a legitimate request takes longer than the TTL to process, its lock could expire prematurely. A retry could then start, leading to duplicate processing. Your TTL must be longer than your endpoint's p99 or p99.9 latency.

Too Long: If a worker crashes, the key is locked for the entire duration of the TTL. If the TTL is 30 minutes, no retries for that operation can succeed for 30 minutes. This can be a significant availability issue.

Mitigation Strategy: There is no perfect solution, only trade-offs. A common strategy is to pair a reasonably long TTL (e.g., 5-15 minutes) with robust monitoring. An alert on the number of IN_PROGRESS keys that are nearing their TTL can signal that workers are crashing or stuck, allowing for manual intervention.

Extension to Asynchronous Consumers (e.g., Kafka)

This exact pattern can be adapted for message queue consumers. The logic remains identical, but the implementation shifts.

The Idempotency-Key would be a unique identifier from the message payload or headers (e.g., a UUID generated by the producer).

The "middleware" becomes a decorator or a higher-order function that wraps your message processing logic.

Instead of returning HTTP responses, the wrapper would control whether to execute the core logic and how to handle the result (e.g., commit the Kafka offset only after successfully storing the COMPLETED state in Redis).

javascript

// Example for a Kafka-like consumer
async function idempotentMessageHandler(message, actualHandler) {
    const idempotencyKey = message.headers.idempotencyKey;
    // ... perform Lua script check ...

    if (state === 'NEW') {
        const result = await actualHandler(message.payload);
        // ... store result in Redis ...
        return result;
    } else if (state === 'COMPLETED') {
        // The message was already processed. Return the cached result or simply ack.
        return getCachedResult();
    } else {
        // IN_PROGRESS or ERROR, likely indicates an issue. Maybe send to DLQ or log.
        throw new Error('Idempotency conflict');
    }
}

Fail-Open vs. Fail-Closed

What happens if Redis is down? Our middleware currently returns a 500 Internal Server Error. This is a fail-closed strategy. For operations like payments, this is the correct choice. It prioritizes correctness over availability, preventing potential duplicate charges at the cost of failing the request.

In some less critical scenarios, you might choose to fail-open—bypassing the idempotency check and proceeding with the operation if Redis is unavailable. This prioritizes availability but re-introduces the risk of duplicate processing. This decision is highly context-dependent.

Conclusion

Implementing a correct, race-condition-free idempotency layer is a non-trivial engineering task that separates robust, production-ready distributed systems from fragile ones. While the concept of an idempotency key is simple, the implementation details—especially around atomicity and state management—are complex. By leveraging the atomic, programmable nature of Redis Lua scripts, we can build a stateful, two-phase locking mechanism that is both correct and performant.

This pattern, centered around an atomic check-and-set operation, stateful tracking with Redis Hashes, and careful handling of edge cases like request body validation and worker crashes, provides a reusable and resilient blueprint for achieving exactly-once processing semantics in an at-least-once world.

The Inevitability of Duplicates in Distributed Systems

The Atomic State Machine: Beyond Simple SET NX

A Better Model: Redis Hashes

Atomicity with Lua: The Core of the Solution

Dissecting the Lua Script

Production Implementation: Node.js & Express Middleware

Project Setup

The Idempotency Middleware

Integrating with an Express App

Testing the Scenarios

Advanced Edge Cases and Performance Considerations

Performance Overhead

Handling Worker Crashes and TTLs

Extension to Asynchronous Consumers (e.g., Kafka)

Fail-Open vs. Fail-Closed

Conclusion

Found this article helpful?