Idempotency-Key Middleware Patterns for Resilient Go APIs
The Inevitability of Duplicate Requests in Distributed Systems
In a perfect world, a client makes a request, and a single, predictable response is returned. In the real world of distributed systems, this contract is fragile. Network glitches, client-side retry logic, reverse proxy timeouts, and even user behavior (like double-clicking a 'Submit' button) can lead to the same logical operation being sent to your API multiple times. For non-mutating requests (GET, HEAD), this is a non-issue. For mutating requests (POST, PUT, PATCH, DELETE), it's a critical flaw that can lead to catastrophic data corruption: double-charging a customer, creating duplicate user accounts, or sending multiple notifications.
The standard solution is to enforce idempotency at the API layer. An operation is idempotent if making the same request multiple times produces the same result as making it once. While some operations are naturally idempotent (e.g., PUT /users/123 to update a user's email), others, particularly resource creation (POST /payments), are not.
This is where the Idempotency-Key header pattern, popularized by services like Stripe, becomes essential. The client generates a unique key for each distinct operation and includes it in the request header. The server then uses this key to recognize and de-duplicate subsequent retries of the same operation. The first request is processed and its result is stored against the key. Any subsequent request with the same key receives the stored result without re-executing the business logic.
This article is not an introduction to the concept. We assume you understand why idempotency is necessary. Instead, we will perform a deep dive into building a production-grade, high-performance idempotency middleware in Go, focusing on the complex implementation details that separate a trivial example from a resilient, battle-tested system.
We will cover:
pending, completed) and prevent race conditions from near-simultaneous requests.chi and Redis.Core Architecture: Middleware and Storage Interface
A clean design starts with a clear separation of concerns. Our idempotency logic should live entirely within a middleware, requiring zero changes to our actual HTTP handlers. The middleware will depend on a storage interface, allowing us to swap out the backend (e.g., from Redis to PostgreSQL) without altering the middleware's core flow.
Let's define our storage interface in Go:
package idempotency
import (
"context"
"time"
)
// Response represents a cached HTTP response.
type Response struct {
StatusCode int
Header map[string][]string
Body []byte
}
// Status represents the state of an idempotent request.
type Status int
const (
InProgress Status = iota
Completed
)
// Store defines the interface for storing and retrieving idempotency records. ype Store interface {
// Lock attempts to acquire a lock for a given idempotency key.
// If the key is already locked, it should return an error indicating a conflict.
// If the key has a completed response, it should return the response and a nil error.
// If the key does not exist, it should create a lock and return a nil response and nil error.
Lock(ctx context.Context, key string) (*Response, error)
// Unlock releases the lock for a given key.
// This is used if the request processing fails before a response can be stored.
Unlock(ctx context.Context, key string) error
// StoreResponse stores the final response for a completed request and releases the lock.
StoreResponse(ctx context.Context, key string, resp Response) error
}
This interface captures the three critical operations:
Lock: This is the entry point. It's an atomic operation that checks the key's status. It must handle three scenarios: * Key exists, response is cached: The original request is Completed. Return the cached Response.
* Key exists, request is InProgress: Another request is currently being processed. Return a conflict error.
* Key does not exist: This is the first time we've seen this key. Create a lock (mark as InProgress) and signal the middleware to proceed.
Unlock: A recovery mechanism. If the handler logic panics or fails catastrophically before a response can be saved, we need a way to release the lock so the client can retry.StoreResponse: The success path. Once the handler has finished, this operation atomically replaces the InProgress lock with the final, cached Response.Deep Dive: A Resilient Redis Implementation
Redis is an excellent choice for an idempotency store due to its high performance and support for atomic operations, which are crucial for preventing race conditions.
Let's implement our Store interface using the go-redis library.
package idempotency
import (
"context"
"encoding/json"
"errors"
"fmt"
"time"
"github.com/redis/go-redis/v9"
)
var (
ErrRequestInFlight = errors.New("request with this idempotency key is already in flight")
)
// RedisStore implements the Store interface using Redis.
type RedisStore struct {
client *redis.Client
lockTTL time.Duration // Time-to-live for the in-progress lock
responseTTL time.Duration // Time-to-live for the stored response
}
// NewRedisStore creates a new RedisStore.
func NewRedisStore(client *redis.Client, lockTTL, responseTTL time.Duration) *RedisStore {
return &RedisStore{
client: client,
lockTTL: lockTTL,
responseTTL: responseTTL,
}
}
// Simplified internal representation for storage.
type storedData struct {
Status Status `json:"status"`
Response Response `json:"response,omitempty"`
}
// Lock is the most critical method. It must be atomic.
func (s *RedisStore) Lock(ctx context.Context, key string) (*Response, error) {
// We use a Lua script to ensure atomicity.
// 1. GET the key.
// 2. If it exists, check its status.
// - If 'Completed', return the data.
// - If 'InProgress', return an error (already locked).
// 3. If it does not exist, SET it with 'InProgress' status and a TTL.
script := `
local key = KEYS[1]
local lock_ttl = ARGV[1]
local data = redis.call('GET', key)
if data then
local decoded_data = cjson.decode(data)
if decoded_data.status == 1 then -- 1 is Completed
return data
else -- 0 is InProgress
return redis.error_reply('LOCKED')
end
else
local new_data = cjson.encode({status = 0}) -- 0 is InProgress
redis.call('SET', key, new_data, 'EX', lock_ttl)
return nil
end
`
res, err := s.client.Eval(ctx, script, []string{key}, s.lockTTL.Seconds()).Result()
if err != nil {
if redis.HasErrorPrefix(err, "LOCKED") {
return nil, ErrRequestInFlight
}
if err != redis.Nil {
return nil, fmt.Errorf("redis eval failed: %w", err)
}
// err is redis.Nil, meaning the key did not exist and a lock was acquired.
return nil, nil
}
// If we reach here, the key existed and was completed.
responseData, ok := res.(string)
if !ok {
return nil, fmt.Errorf("unexpected response type from redis: %T", res)
}
var data storedData
if err := json.Unmarshal([]byte(responseData), &data); err != nil {
return nil, fmt.Errorf("failed to unmarshal stored response: %w", err)
}
return &data.Response, nil
}
// StoreResponse atomically updates the key with the final response.
func (s *RedisStore) StoreResponse(ctx context.Context, key string, resp Response) error {
data := storedData{
Status: Completed,
Response: resp,
}
marshaledData, err := json.Marshal(data)
if err != nil {
return fmt.Errorf("failed to marshal response: %w", err)
}
// SET the key with the final response and the response TTL.
// This overwrites the 'InProgress' lock.
return s.client.Set(ctx, key, marshaledData, s.responseTTL).Err()
}
// Unlock simply deletes the key.
func (s *RedisStore) Unlock(ctx context.Context, key string) error {
return s.client.Del(ctx, key).Err()
}
Analysis of the Redis Implementation
Lock method is the heart of the system. A naive GET followed by a SET would create a classic check-then-act race condition. If two requests arrive at the same millisecond, both could GET a nil value, and both would proceed to SET a lock, leading to double execution. The Lua script is non-negotiable here. It executes atomically on the Redis server, guaranteeing that the entire check-and-set logic completes without interruption.Status enum (InProgress / Completed) serialized into the JSON value. This allows us to differentiate between a request that is actively being processed and one that has finished. lockTTL: This is a safety net*. If your Go process crashes after acquiring a lock but before storing the response, this TTL ensures the lock is eventually released. A typical value might be 1-5 minutes. It should be longer than your expected maximum request processing time.
* responseTTL: This determines how long you cache the final response. A common value is 24 hours, aligning with the typical lifetime of an idempotency key. This prevents your Redis instance from growing indefinitely.
LOCKED) which we check for in the Go code. This allows us to translate the Redis-level error into a domain-specific error, ErrRequestInFlight, which the middleware can use to generate a 409 Conflict response.The Two-Phase Commit Middleware Pattern
Now we can build the middleware itself. A common mistake is to try to do everything before calling the next handler. A robust implementation requires a two-phase approach:
* Phase 1 (Pre-Handler): Check for the Idempotency-Key header. Call store.Lock(). Based on the result, either return a cached response, return a conflict error, or proceed to the next handler.
* Phase 2 (Post-Handler): After the handler has executed, capture its response (status code, headers, body). Call store.StoreResponse() to save this result. This phase must execute even if the handler panics.
To capture the response, we need to wrap http.ResponseWriter.
package idempotency
import (
"bytes"
"io"
"net/http"
)
// responseWriterInterceptor captures the response body and status code.
type responseWriterInterceptor struct {
http.ResponseWriter
body *bytes.Buffer
statusCode int
}
func newResponseWriterInterceptor(w http.ResponseWriter) *responseWriterInterceptor {
return &responseWriterInterceptor{
ResponseWriter: w,
body: &bytes.Buffer{},
statusCode: http.StatusOK, // Default status code
}
}
func (rwi *responseWriterInterceptor) Write(b []byte) (int, error) {
// Capture the body
rwi.body.Write(b)
return rwi.ResponseWriter.Write(b)
}
func (rwi *responseWriterInterceptor) WriteHeader(statusCode int) {
// Capture the status code
rwi.statusCode = statusCode
rwi.ResponseWriter.WriteHeader(statusCode)
}
Now, let's write the middleware using this interceptor.
package idempotency
import (
"context"
"errors"
"log"
"net/http"
)
func Middleware(store Store) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// We only apply idempotency to mutating methods
if r.Method != http.MethodPost && r.Method != http.MethodPatch && r.Method != http.MethodPut {
next.ServeHTTP(w, r)
return
}
key := r.Header.Get("Idempotency-Key")
if key == "" {
next.ServeHTTP(w, r)
return
}
cachedResp, err := store.Lock(r.Context(), key)
if err != nil {
if errors.Is(err, ErrRequestInFlight) {
http.Error(w, "A request with this idempotency key is already in progress", http.StatusConflict)
return
}
// Some other storage error
http.Error(w, "Internal Server Error", http.StatusInternalServerError)
return
}
if cachedResp != nil {
// Found a cached response, replay it.
for k, v := range cachedResp.Header {
w.Header()[k] = v
}
w.WriteHeader(cachedResp.StatusCode)
w.Write(cachedResp.Body)
return
}
// No cached response and lock acquired. Proceed with the request.
// Use a deferred function to handle unlocking in case of panics.
var requestFailed bool
defer func() {
if requestFailed {
// If the request handler panicked or failed, unlock the key so it can be retried.
if err := store.Unlock(r.Context(), key); err != nil {
log.Printf("ERROR: failed to unlock idempotency key %s: %v", key, err)
}
}
}()
interceptor := newResponseWriterInterceptor(w)
// Use a recover block to catch panics in the handler chain.
defer func() {
if r := recover(); r != nil {
requestFailed = true
// Re-panic to allow other panic-handling middleware to run.
panic(r)
}
}()
next.ServeHTTP(interceptor, r)
// If we got this far without a panic, the handler has completed.
// Do not cache server errors, as they might be transient.
if interceptor.statusCode >= 500 {
requestFailed = true
return // The defer will handle unlocking
}
// Store the successful response.
respToCache := Response{
StatusCode: interceptor.statusCode,
Header: interceptor.Header(),
Body: interceptor.body.Bytes(),
}
if err := store.StoreResponse(r.Context(), key, respToCache); err != nil {
log.Printf("ERROR: failed to store idempotency response for key %s: %v", key, err)
// This is a tricky state. The operation succeeded but we failed to record it.
// The next request with the same key will re-execute.
}
})
}
}
Analysis of the Middleware Implementation
GET) and requests without an Idempotency-Key header. This minimizes performance overhead.defer block with recover is crucial. If your business logic handler panics, the request is marked as failed, and the defer function calls store.Unlock(). This prevents the lock from being held until its TTL expires, allowing a client to retry much sooner.5xx responses. A 500 Internal Server Error might be caused by a temporary downstream service failure. Caching this error would prevent a valid retry from succeeding. Instead, we treat it as a failure and unlock the key, inviting the client to try again.ERROR: failed to store idempotency response. This means your business logic (e.g., charging a credit card) succeeded, but the call to Redis to save the result failed. In this scenario, the system's exactly-once guarantee is broken. The next request with the same key will find no lock and no cached response, and will re-execute the business logic. This is an unavoidable trade-off in most practical distributed systems. The solution requires a distributed transaction coordinator, which is often an order of magnitude more complex. For most applications, logging this rare event for manual reconciliation is the pragmatic choice.Complete, Runnable Example
Let's tie this all together in a runnable application.
main.go
package main
import (
"encoding/json"
"fmt"
"log"
"math/rand"
"net/http"
"time"
"github.com/go-chi/chi/v5"
"github.com/go-chi/chi/v5/middleware"
"github.com/google/uuid"
"github.com/redis/go-redis/v9"
// Use the idempotency package we defined earlier
"./idempotency"
)
func main() {
// --- Setup Redis Client ---
redisClient := redis.NewClient(&redis.Options{
Addr: "localhost:6379",
})
// --- Setup Idempotency Store ---
idempotencyStore := idempotency.NewRedisStore(redisClient, 1*time.Minute, 24*time.Hour)
// --- Setup Chi Router and Middleware ---
r := chi.NewRouter()
r.Use(middleware.Logger)
r.Use(middleware.Recoverer)
// Apply the idempotency middleware
r.Use(idempotency.Middleware(idempotencyStore))
r.Post("/payments", createPaymentHandler)
log.Println("Server starting on :3000")
if err := http.ListenAndServe(":3000", r); err != nil {
log.Fatalf("Failed to start server: %v", err)
}
}
// A sample handler for creating a payment.
type PaymentRequest struct {
Amount int `json:"amount"`
Currency string `json:"currency"`
}
type PaymentResponse struct {
TransactionID string `json:"transaction_id"`
Status string `json:"status"`
}
func createPaymentHandler(w http.ResponseWriter, r *http.Request) {
var req PaymentRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid request body", http.StatusBadRequest)
return
}
log.Printf("Processing payment for %d %s...", req.Amount, req.Currency)
// Simulate processing time and potential failures
time.Sleep(2 * time.Second)
// Simulate a transient error 10% of the time
if rand.Intn(10) == 0 {
log.Println("Simulating a transient server error...")
http.Error(w, "Downstream payment gateway failed", http.StatusInternalServerError)
return
}
resp := PaymentResponse{
TransactionID: uuid.New().String(),
Status: "completed",
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusCreated)
json.NewEncoder(w).Encode(resp)
log.Printf("Successfully processed payment. Transaction ID: %s", resp.TransactionID)
}
Testing the Scenarios
First, start Docker and Redis: docker run -d -p 6379:6379 redis
Then run the Go application: go run .
Generate a UUID for your key:
# On Linux/macOS
IDEM_KEY=$(uuidgen)
# On Windows (PowerShell)
# $IDEM_KEY = [guid]::NewGuid().ToString()
Scenario 1: First Successful Request
The request takes 2 seconds to process.
curl -v -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: $IDEM_KEY" \
-d '{"amount": 1000, "currency": "USD"}'
# Server Logs:
# Processing payment for 1000 USD...
# Successfully processed payment. Transaction ID: <some-new-uuid>
# Output:
# < HTTP/1.1 201 Created
# ...
# {"transaction_id":"<some-new-uuid>","status":"completed"}
Scenario 2: Retried Request (gets cached response)
This request returns instantly.
curl -v -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: $IDEM_KEY" \
-d '{"amount": 1000, "currency": "USD"}'
# Server Logs:
# (No processing logs, the middleware handles it)
# Output:
# < HTTP/1.1 201 Created
# ...
# {"transaction_id":"<the-same-uuid-as-before>","status":"completed"}
Notice the transaction ID is identical, proving the business logic did not re-run.
Scenario 3: Concurrent Requests (race condition)
Open two terminals. Run the same curl command in both at the exact same time.
# Terminal 1 (will likely win the race)
curl ...
# Output: 201 Created after 2 seconds
# Terminal 2 (will lose the race)
curl ...
# Output: 409 Conflict instantly
# A request with this idempotency key is already in progress
This demonstrates that our atomic Lock operation is working correctly.
Scenario 4: Server Error
Keep running the first command with a new key until you hit the simulated error.
IDEM_KEY_FAIL=$(uuidgen)
curl -v -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: $IDEM_KEY_FAIL" \
-d '{"amount": 500, "currency": "EUR"}'
# Server Logs:
# Processing payment for 500 EUR...
# Simulating a transient server error...
# Output:
# < HTTP/1.1 500 Internal Server Error
# Downstream payment gateway failed
Now, immediately retry the exact same command. This time it might succeed.
curl -v -X POST http://localhost:3000/payments \
-H "Content-Type: application/json" \
-H "Idempotency-Key: $IDEM_KEY_FAIL" \
-d '{"amount": 500, "currency": "EUR"}'
# Server Logs:
# Processing payment for 500 EUR...
# Successfully processed payment. Transaction ID: <new-uuid-for-this-payment>
# Output:
# < HTTP/1.1 201 Created
# ...
This proves that server errors are not cached, and the lock is correctly released, allowing for successful retries.
Conclusion and Further Considerations
Implementing a robust idempotency layer is a defining feature of a mature, resilient API. While the concept is straightforward, the devil is in the details of atomic operations, state management, and graceful failure handling.
Our Go middleware, backed by a Lua-scripted Redis store, provides a production-grade foundation. It handles concurrent requests safely, recovers from handler panics, and intelligently decides which responses to cache.
For senior engineers building mission-critical systems, consider these further enhancements:
* Garbage Collection: While TTLs handle key expiration, you might want a background job that actively cleans up orphaned InProgress keys that survived past their TTL, just in case of Redis clock drift or other anomalies.
* Alternative Backends: For systems already heavily reliant on PostgreSQL, you could implement the Store interface using SELECT ... FOR UPDATE to achieve row-level locking, providing similar atomicity guarantees within a transactional context.
Payload Hashing: To protect against client-side bugs where the same Idempotency-Key is accidentally sent with two different* payloads, some systems store a hash of the request body along with the key. If a second request arrives with the same key but a different payload hash, it's rejected with a 422 Unprocessable Entity error.