Idempotency Layers in Async Services with Redis & Lua
The Inevitability of Duplicates in Distributed Systems
In any non-trivial distributed system, particularly those employing message queues or asynchronous HTTP retries, the contract is almost always at-least-once delivery. The network is unreliable, services crash, and consumers get redeployed. This reality forces us, as application developers, to confront the problem of duplicate requests. A client retrying a failed payment API call, a Kafka consumer re-processing a message after a crash, or a webhook provider re-sending a notification—all these scenarios can lead to unintended side effects if the underlying operation is not idempotent.
While some operations are naturally idempotent (e.g., PUT /users/123), many critical business operations are not (e.g., POST /payments, POST /orders). Simply wrapping a database transaction around the operation isn't enough. The transaction might succeed, but the response to the client might be lost, triggering a retry that re-executes the already-committed transaction.
The standard solution is an idempotency layer, typically implemented as a middleware that intercepts incoming requests. It uses a unique Idempotency-Key provided by the client to track the status of an operation. A naive implementation might look like this:
POST /payments with Idempotency-Key: some-unique-value.some-unique-value exists in a cache (like Redis).- If it exists, return the cached response.
- If not, process the payment, cache the response, and then return it.
This approach is fundamentally broken due to race conditions. Two identical requests arriving milliseconds apart could both perform step #2, find no key, and proceed to process the payment twice. This article details a robust, production-grade pattern to solve this problem atomically using Redis and Lua.
The Atomic State Machine: Beyond Simple SET NX
A common first attempt to fix the race condition is to use Redis's SET key value NX EX ttl command. The NX option ensures the key is only set if it doesn't already exist, making the initial lock acquisition atomic. However, this is still insufficient for a complete idempotency lifecycle.
What state does the operation have? At a minimum:
Using a simple string key in Redis can't effectively model this. If a worker acquires a lock (SET idempotency-key IN_PROGRESS NX) and then crashes, the key is stuck in IN_PROGRESS until its TTL expires. More importantly, there's no place to store the final response associated with the key.
A Better Model: Redis Hashes
We can model the entire lifecycle of an idempotent request using a Redis Hash. Each Idempotency-Key maps to a hash with fields like:
status: IN_PROGRESS | COMPLETEDresponse_code: e.g., 201response_body: The JSON response bodyrequest_hash: A hash of the request body to prevent key misuseThis structure allows us to differentiate between an operation that is actively being processed and one that has finished. The core challenge remains: how do we check for the key, create it in an IN_PROGRESS state, and retrieve a COMPLETED state—all in a single, atomic operation to prevent race conditions?
This is where Lua scripting in Redis becomes essential.
Atomicity with Lua: The Core of the Solution
Redis allows the execution of server-side Lua scripts. These scripts are executed atomically. No other Redis command will run while a Lua script is executing. This gives us the power to implement our complex locking and checking logic without fear of interruption.
Here is the Lua script that forms the foundation of our idempotency middleware. It handles the initial check when a request first arrives.
-- idempotency_check.lua
-- KEYS[1]: The idempotency key provided by the client
-- ARGV[1]: The hash of the incoming request body
-- ARGV[2]: The TTL for the key in seconds (for the IN_PROGRESS state)
-- Try to get the existing hash for the key
local key_data = redis.call('HGETALL', KEYS[1])
-- Check if the key exists
if #key_data > 0 then
-- Key exists, convert the array response to a map for easier access
local response_map = {}
for i = 1, #key_data, 2 do
response_map[key_data[i]] = key_data[i+1]
end
-- Check if the request hash matches. If not, it's a client error.
if response_map['request_hash'] ~= ARGV[1] then
return {'ERROR', 'KEY_REUSE_MISMATCH'}
end
-- If the status is 'COMPLETED', return the stored response
if response_map['status'] == 'COMPLETED' then
return {'COMPLETED', response_map['response_code'], response_map['response_body']}
end
-- If the status is 'IN_PROGRESS', another request is processing it.
if response_map['status'] == 'IN_PROGRESS' then
return {'IN_PROGRESS'}
end
else
-- Key does not exist. This is a new request.
-- Create the hash in an 'IN_PROGRESS' state.
redis.call('HSET', KEYS[1], 'status', 'IN_PROGRESS', 'request_hash', ARGV[1])
redis.call('EXPIRE', KEYS[1], ARGV[2])
return {'NEW'}
end
Dissecting the Lua Script
This script returns a tuple-like array that our application code can easily parse:
HGETALL: We first attempt to fetch the entire hash associated with the idempotency key. * If key_data is not empty, we first validate the request_hash. This is a critical security and correctness check. A client should not be able to reuse an idempotency key for a different operation. If the hashes don't match, we return an error state {'ERROR', 'KEY_REUSE_MISMATCH'}. Our middleware should translate this into a 422 Unprocessable Entity response.
* If the hashes match and the status is COMPLETED, we've found a cached result. We return {'COMPLETED', code, body}, allowing the middleware to immediately serve the cached response.
* If the status is IN_PROGRESS, a concurrent request is being processed. We return {'IN_PROGRESS'}. The middleware should translate this to a 409 Conflict.
* If key_data is empty, this is the first time we've seen this key.
* We atomically create the hash with HSET, setting the status to IN_PROGRESS and storing the request_hash.
* We set an EXPIRE on the key. This is our safety net. If the worker crashes mid-process, the lock will eventually be released, allowing a retry to proceed.
* We return {'NEW'} to signal that the middleware should proceed to the main business logic.
Production Implementation: Node.js & Express Middleware
Let's translate this logic into a practical, reusable middleware for an Express.js application. We'll use the ioredis library for Redis communication and crypto for hashing.
Project Setup
npm init -y
npm install express ioredis crypto
The Idempotency Middleware
Here is the complete implementation. Note the two-phase nature: a pre-handler check and a post-handler result storage.
// idempotencyMiddleware.js
const Redis = require('ioredis');
const crypto = require('crypto');
const fs = require('fs');
const path = require('path');
// --- Configuration ---
const redisClient = new Redis({
host: 'localhost',
port: 6379,
// Add other production settings: password, tls, etc.
});
// TTL for in-progress keys (in seconds). This is our crash guard.
const IN_PROGRESS_TTL = 300; // 5 minutes
// TTL for completed keys (in seconds). How long to cache results.
const COMPLETED_TTL = 86400; // 24 hours
// --- Lua Script Loading ---
// It's much more efficient to load the script once and use EVALSHA
const luaScript = fs.readFileSync(path.join(__dirname, 'idempotency_check.lua'), 'utf8');
let scriptSha;
async function loadLuaScript() {
try {
scriptSha = await redisClient.script('load', luaScript);
console.log('Idempotency Lua script loaded successfully. SHA:', scriptSha);
} catch (err) {
console.error('Failed to load Lua script into Redis', err);
process.exit(1);
}
}
// --- Helper Functions ---
function getRequestHash(req) {
// A stable stringification is important. Sorting keys is a good practice.
const sortedBody = JSON.stringify(Object.keys(req.body).sort().reduce(
(obj, key) => {
obj[key] = req.body[key];
return obj;
},
{}
));
return crypto.createHash('sha256').update(sortedBody).digest('hex');
}
// --- The Middleware ---
async function idempotencyMiddleware(req, res, next) {
const idempotencyKey = req.headers['idempotency-key'];
if (!idempotencyKey) {
// For endpoints requiring idempotency, you might choose to fail fast.
// Or, you could bypass the middleware if the key is optional.
return next();
}
if (!scriptSha) {
console.error('Lua script not loaded. Cannot process idempotent request.');
return res.status(500).json({ error: 'Server configuration error' });
}
const requestHash = getRequestHash(req);
try {
const result = await redisClient.evalsha(scriptSha, 1, idempotencyKey, requestHash, IN_PROGRESS_TTL);
const state = result[0];
if (state === 'NEW') {
// This is a new request. Proceed to the handler.
// We need to intercept the response to store it.
const originalSend = res.send;
res.send = function (body) {
// Only cache successful responses (e.g., 2xx)
if (res.statusCode >= 200 && res.statusCode < 300) {
const responseBody = typeof body === 'string' ? body : JSON.stringify(body);
// Use a MULTI/EXEC pipeline to set the final state and TTL atomically
redisClient.multi()
.hset(idempotencyKey, 'status', 'COMPLETED', 'response_code', res.statusCode, 'response_body', responseBody)
.expire(idempotencyKey, COMPLETED_TTL)
.exec();
}
originalSend.apply(res, arguments);
};
return next();
} else if (state === 'COMPLETED') {
const [, statusCode, body] = result;
res.status(parseInt(statusCode, 10)).header('Content-Type', 'application/json').send(body);
return;
} else if (state === 'IN_PROGRESS') {
return res.status(409).json({ error: 'Request with this idempotency key is already in progress.' });
} else if (state === 'ERROR' && result[1] === 'KEY_REUSE_MISMATCH') {
return res.status(422).json({ error: 'Idempotency key reused with a different request body.' });
}
} catch (err) {
console.error('Idempotency middleware Redis error:', err);
// Fail open or closed? Failing open risks duplicate processing.
// Failing closed (500 error) is safer for non-idempotent operations.
return res.status(500).json({ error: 'Could not process idempotency key.' });
}
}
module.exports = { idempotencyMiddleware, loadLuaScript };
Integrating with an Express App
// server.js
const express = require('express');
const { idempotencyMiddleware, loadLuaScript } = require('./idempotencyMiddleware');
const app = express();
app.use(express.json());
// A mock database
const payments = {};
// This route is protected by the idempotency middleware
app.post('/payments', idempotencyMiddleware, async (req, res) => {
const { amount, currency, destination } = req.body;
const paymentId = `pid_${Date.now()}`;
// Simulate a slow, non-idempotent operation
console.log(`Processing payment ${paymentId} for ${amount} ${currency}...`);
await new Promise(resolve => setTimeout(resolve, 2000)); // Simulate DB call, external API, etc.
const newPayment = { id: paymentId, amount, currency, destination, status: 'SUCCESS' };
payments[paymentId] = newPayment;
console.log(`Payment ${paymentId} successful.`);
res.status(201).json(newPayment);
});
app.get('/payments', (req, res) => {
res.json(Object.values(payments));
});
const PORT = 3000;
// Load the Lua script before starting the server
loadLuaScript().then(() => {
app.listen(PORT, () => {
console.log(`Server running on http://localhost:${PORT}`);
});
});
Testing the Scenarios
You can test the different paths using curl:
1. First Successful Request:
curl -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: test-key-123" \
-d '{"amount": 100, "currency": "USD", "destination": "acct_123"}'
# Server logs: Processing payment...
# Server logs: Payment successful.
# Response: (201 Created) { "id": "...", "amount": 100, ... }
2. Immediate Retry (while first is processing):
Open a second terminal and run the same command immediately. You will get:
# Response: (409 Conflict) { "error": "Request with this idempotency key is already in progress." }
3. Retry After Completion:
Run the command again after the first one has finished.
# Server logs: (nothing, the handler is not hit)
# Response: (201 Created) { "id": "...", "amount": 100, ... } (This is the cached response)
4. Key Reuse with Different Body:
Change the request body but use the same key.
curl -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: test-key-123" \
-d '{"amount": 999, "currency": "USD", "destination": "acct_123"}'
# Response: (422 Unprocessable Entity) { "error": "Idempotency key reused with a different request body." }
Advanced Edge Cases and Performance Considerations
This implementation is robust, but in a real-world, high-throughput system, there are further considerations.
Performance Overhead
EVAL vs. EVALSHA: Our code uses EVALSHA. The first time the server starts, it sends the script to Redis via SCRIPT LOAD and gets a SHA1 hash. Subsequent calls use the much smaller EVALSHA command, sending only the hash. This reduces network bandwidth compared to sending the full script on every request.Handling Worker Crashes and TTLs
The IN_PROGRESS_TTL is a double-edged sword.
Mitigation Strategy: There is no perfect solution, only trade-offs. A common strategy is to pair a reasonably long TTL (e.g., 5-15 minutes) with robust monitoring. An alert on the number of IN_PROGRESS keys that are nearing their TTL can signal that workers are crashing or stuck, allowing for manual intervention.
Extension to Asynchronous Consumers (e.g., Kafka)
This exact pattern can be adapted for message queue consumers. The logic remains identical, but the implementation shifts.
Idempotency-Key would be a unique identifier from the message payload or headers (e.g., a UUID generated by the producer).- The "middleware" becomes a decorator or a higher-order function that wraps your message processing logic.
COMPLETED state in Redis).// Example for a Kafka-like consumer
async function idempotentMessageHandler(message, actualHandler) {
const idempotencyKey = message.headers.idempotencyKey;
// ... perform Lua script check ...
if (state === 'NEW') {
const result = await actualHandler(message.payload);
// ... store result in Redis ...
return result;
} else if (state === 'COMPLETED') {
// The message was already processed. Return the cached result or simply ack.
return getCachedResult();
} else {
// IN_PROGRESS or ERROR, likely indicates an issue. Maybe send to DLQ or log.
throw new Error('Idempotency conflict');
}
}
Fail-Open vs. Fail-Closed
What happens if Redis is down? Our middleware currently returns a 500 Internal Server Error. This is a fail-closed strategy. For operations like payments, this is the correct choice. It prioritizes correctness over availability, preventing potential duplicate charges at the cost of failing the request.
In some less critical scenarios, you might choose to fail-open—bypassing the idempotency check and proceeding with the operation if Redis is unavailable. This prioritizes availability but re-introduces the risk of duplicate processing. This decision is highly context-dependent.
Conclusion
Implementing a correct, race-condition-free idempotency layer is a non-trivial engineering task that separates robust, production-ready distributed systems from fragile ones. While the concept of an idempotency key is simple, the implementation details—especially around atomicity and state management—are complex. By leveraging the atomic, programmable nature of Redis Lua scripts, we can build a stateful, two-phase locking mechanism that is both correct and performant.
This pattern, centered around an atomic check-and-set operation, stateful tracking with Redis Hashes, and careful handling of edge cases like request body validation and worker crashes, provides a reusable and resilient blueprint for achieving exactly-once processing semantics in an at-least-once world.