Idempotency Keys at the API Gateway for Resilient Microservices

October 4, 2025

15 min read

Goh Ling Yong

Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Inevitable Problem: At-Least-Once Delivery and Its Perils

In any non-trivial distributed system, the specter of network partitions, client-side timeouts, and transient service unavailability forces us to design for failure. A common client-side recovery pattern is to retry a failed request. This pragmatic approach, however, shifts the delivery guarantee from "at-most-once" to "at-least-once," introducing a critical new problem: the potential for duplicate processing of non-idempotent operations.

Consider a POST /v1/payments endpoint. A client sends a request, the server processes the payment, but the response is lost in transit. The client, receiving no confirmation, retries the request. The server, unaware of the previous successful transaction, processes the payment again. The result is a double charge and an irate customer. This is the classic failure mode that keeps architects of financial and e-commerce systems awake at night.

The textbook solution is to make every state-mutating endpoint idempotent. An operation is idempotent if making the same request multiple times produces the same result as making it once. While some operations are naturally idempotent (PUT /users/123 or DELETE /orders/456), many critical business operations (POST /transfers) are not.

Implementing idempotency logic within each microservice is a viable, yet repetitive and error-prone, solution. It requires every development team to correctly implement a complex pattern involving request fingerprinting, state storage, and response caching, often in different programming languages with varying levels of support for distributed locking and caching. This leads to code duplication, inconsistent implementations, and a significant maintenance burden.

This article presents a more robust, scalable, and architecturally clean solution: centralizing idempotency handling at the API Gateway layer.

The Architectural Argument: Why the API Gateway?

Placing the idempotency layer at the API Gateway, the single entry point for all external traffic, offers several compelling advantages over a service-by-service implementation:

Language Agnostic: The underlying microservices can be written in Go, Rust, Python, or Java. The idempotency logic is completely decoupled from their implementation, enforced transparently at the ingress point.

Centralized Control & Consistency: A single, well-tested implementation ensures that all designated endpoints adhere to the same idempotency contract. This eliminates the risk of inconsistent or buggy implementations across dozens of services.

Reduced Service-Level Boilerplate: Downstream services can focus purely on their business logic. They don't need to be aware of the idempotency mechanism, simplifying their design and reducing cognitive load for developers.

Optimized Performance: By short-circuiting duplicate requests at the edge, the gateway prevents unnecessary processing load on downstream services and databases, conserving resources and reducing latency for retried requests.

Of course, this pattern is not without its trade-offs. The API Gateway becomes an even more critical component, and its dependency on a distributed cache (like Redis) introduces another potential point of failure. However, with proper high-availability setups for both the gateway and the cache, the architectural benefits overwhelmingly justify the added complexity.

The Core Implementation Pattern: Request-Response Cache with Atomic Locking

The mechanism hinges on a client-generated Idempotency-Key header and a high-performance, distributed key-value store like Redis. The client is responsible for generating a unique key (a UUIDv4 is an excellent choice) for each distinct operation they wish to make idempotent.

The flow for a request containing an Idempotency-Key is as follows:

Intercept Request: The API Gateway middleware intercepts the incoming request before it's proxied to a downstream service.

Construct Cache Key: A unique cache key is constructed, crucially including not just the Idempotency-Key but also an identifier for the authenticated principal (e.g., User ID, API Key). This prevents one user from replaying another's request. Example: idem:key:USER_ID:IDEMPOTENCY_KEY.

Atomic Check-and-Set: The gateway performs an atomic operation against Redis to check the status of this key.

This is where the nuance lies. A simple GET followed by a SET is not sufficient, as it creates a race condition where two identical requests could both find the key missing and proceed to the downstream service. We must handle three states:

* State 1: Key Does Not Exist (First Request): The request is novel. The gateway must atomically set a PENDING placeholder in Redis with a short Time-To-Live (TTL), e.g., 15 seconds. It then forwards the request to the downstream service.

* State 2: Key Exists with PENDING Status (In-Flight Request): Another request with the same key arrived while the first is being processed. This indicates a race condition. The gateway should immediately return a 409 Conflict response, signaling to the client that the operation is already in progress.

* State 3: Key Exists with a Cached Response (Completed Request): The original request was successfully processed, and its response has been cached. The gateway bypasses the downstream service entirely and returns the cached response (status code, headers, and body) directly to the client.

Cache the Final Response: Once the downstream service responds, the gateway intercepts this response. It replaces the PENDING placeholder in Redis with the serialized response data (status code, headers, body) and sets a longer TTL, typically 24 hours, after which the key can be safely reused.

Let's translate this into production-grade Go code.

Code Example 1: The Go Idempotency Middleware

We'll implement an http.Handler middleware in Go that can be wrapped around any API gateway's proxy logic. This example uses the popular go-redis client.

package main

import (
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"io"
	"log"
	"net/http"
	"net/http/httptest"
	"net/http/httputil"
	"strings"
	"time"

	"github.com/go-redis/redis/v8"
	"github.com/google/uuid"
)

// A custom ResponseWriter to capture the status code and body
type responseRecorder struct {
	http.ResponseWriter
	statusCode int
	body       *bytes.Buffer
}

func (rec *responseRecorder) WriteHeader(statusCode int) {
	rec.statusCode = statusCode
	rec.ResponseWriter.WriteHeader(statusCode)
}

func (rec *responseRecorder) Write(b []byte) (int, error) {
	return rec.body.Write(b)
}

// The data we will cache in Redis
type cachedResponse struct {
	StatusCode int         `json:"statusCode"`
	Headers    http.Header `json:"headers"`
	Body       []byte      `json:"body"`
}

// IdempotencyMiddleware provides a layer to handle idempotent requests.
type IdempotencyMiddleware struct {
	redisClient *redis.Client
	next        http.Handler
}

// NewIdempotencyMiddleware creates a new instance of the middleware.
func NewIdempotencyMiddleware(redisAddr string, next http.Handler) *IdempotencyMiddleware {
	rdb := redis.NewClient(&redis.Options{
		Addr: redisAddr,
	})

	// Check Redis connection
	_, err := rdb.Ping(context.Background()).Result()
	if err != nil {
		log.Fatalf("Could not connect to Redis: %v", err)
	}

	return &IdempotencyMiddleware{
		redisClient: rdb,
		next:        next,
	}
}

// The core Lua script for atomic check-and-set
const atomicCheckAndSetScript = `
    -- KEYS[1] = idempotency key
    -- ARGV[1] = pending status value ('PENDING')
    -- ARGV[2] = pending status TTL in seconds

    local current_val = redis.call('get', KEYS[1])
    if current_val == false then
        -- Key does not exist, this is the first request.
        -- Set the pending key and return 'OK_PROCEED'.
        redis.call('set', KEYS[1], ARGV[1], 'EX', ARGV[2])
        return 'OK_PROCEED'
    else
        -- Key exists, return its value for the middleware to handle.
        return current_val
    end
`

func (im *IdempotencyMiddleware) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	// Only apply to mutable methods
	if r.Method != http.MethodPost && r.Method != http.MethodPatch && r.Method != http.MethodPut {
		im.next.ServeHTTP(w, r)
		return
	}

	idempotencyKey := r.Header.Get("Idempotency-Key")
	if idempotencyKey == "" {
		im.next.ServeHTTP(w, r)
		return
	}

	// CRITICAL: Always scope the key to the authenticated principal.
	// Here we simulate getting a user ID from the request context.
	userID := getUserIDFromContext(r.Context())
	if userID == "" {
		http.Error(w, "Unauthorized", http.StatusUnauthorized)
		return
	}

	cacheKey := fmt.Sprintf("idem:%s:%s", userID, idempotencyKey)
	ctx := context.Background()

	// Run the Lua script for atomicity
	script := redis.NewScript(atomicCheckAndSetScript)
	result, err := script.Run(ctx, im.redisClient, []string{cacheKey}, "PENDING", 15).Result()
	if err != nil {
		log.Printf("Redis script error for key %s: %v", cacheKey, err)
		http.Error(w, "Internal Server Error", http.StatusInternalServerError)
		return
	}

	status := result.(string)

	switch {
	case status == "OK_PROCEED":
		// This is the first request, proceed to the handler
		im.processAndCache(w, r, cacheKey)

	case status == "PENDING":
		// A request is already in-flight, reject this one
		w.Header().Set("Content-Type", "application/json")
		w.WriteHeader(http.StatusConflict)
		w.Write([]byte(`{"error": "Request already in progress"}`))

	default:
		// The key exists and contains a cached response
		var cachedResp cachedResponse
		if err := json.Unmarshal([]byte(status), &cachedResp); err != nil {
			log.Printf("Failed to unmarshal cached response for key %s: %v", cacheKey, err)
			http.Error(w, "Internal Server Error", http.StatusInternalServerError)
			return
		}

		// Write the cached response back to the client
		for key, values := range cachedResp.Headers {
			for _, value := range values {
				w.Header().Add(key, value)
			}
		}
		w.WriteHeader(cachedResp.StatusCode)
		w.Write(cachedResp.Body)
	}
}

func (im *IdempotencyMiddleware) processAndCache(w http.ResponseWriter, r *http.Request, cacheKey string) {
	// We need to capture the response from the next handler
	recorder := &responseRecorder{
		ResponseWriter: w,
		statusCode:     http.StatusOK, // Default status code
		body:           new(bytes.Buffer),
	}

	// To handle potential panics in downstream handlers
	defer func() {
		if r := recover(); r != nil {
			log.Printf("Panic recovered in handler: %v", r)
			// Don't cache panic responses, just delete the pending key
			im.redisClient.Del(context.Background(), cacheKey)
			http.Error(w, "Internal Server Error", http.StatusInternalServerError)
		}
	}()

	im.next.ServeHTTP(recorder, r)

	// The actual response body is in recorder.body, now write it to the original writer
	w.Write(recorder.body.Bytes())

	// Cache the captured response
	respToCache := cachedResponse{
		StatusCode: recorder.statusCode,
		Headers:    recorder.Header(),
		Body:       recorder.body.Bytes(),
	}

	marshaledResp, err := json.Marshal(respToCache)
	if err != nil {
		log.Printf("Failed to marshal response for caching, key %s: %v", cacheKey, err)
		// Don't leave a PENDING key indefinitely
		im.redisClient.Del(context.Background(), cacheKey)
		return
	}

	// Set the final response with a 24-hour TTL
	err = im.redisClient.Set(context.Background(), cacheKey, marshaledResp, 24*time.Hour).Err()
	if err != nil {
		log.Printf("Failed to cache response for key %s: %v", cacheKey, err)
	}
}

// --- Mocking for a runnable example ---

// A dummy handler that simulates a downstream service
func downstreamServiceHandler(w http.ResponseWriter, r *http.Request) {
	log.Println("Downstream service processing request...")
	// Simulate work
	time.Sleep(100 * time.Millisecond)

	var reqBody map[string]interface{}
	if err := json.NewDecoder(r.Body).Decode(&reqBody); err != nil {
		http.Error(w, "Invalid JSON", http.StatusBadRequest)
		return
	}

	w.Header().Set("Content-Type", "application/json")
	w.WriteHeader(http.StatusCreated)
	json.NewEncoder(w).Encode(map[string]interface{}{
		"status":      "success",
		"transactionId": uuid.NewString(),
		"amount":      reqBody["amount"],
	})
}

// A helper to add a user ID to the context for simulation
func withUserID(next http.Handler, userID string) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		ctx := context.WithValue(r.Context(), "userID", userID)
		next.ServeHTTP(w, r.WithContext(ctx))
	})
}

func getUserIDFromContext(ctx context.Context) string {
	if userID, ok := ctx.Value("userID").(string); ok {
		return userID
	}
	return ""
}

func main() {
	// This main function is for demonstration. In a real scenario,
	// the middleware would be part of an API Gateway.

	// Start a mock Redis server (requires docker)
	// docker run -d --name redis-idem -p 6379:6379 redis
	redisAddr := "localhost:6379"

	handler := http.HandlerFunc(downstreamServiceHandler)
	middleware := NewIdempotencyMiddleware(redisAddr, handler)
	authedHandler := withUserID(middleware, "user-123") // Authenticate as user-123

	// Simulate multiple concurrent requests with the same idempotency key
	idemKey := uuid.NewString()
	log.Printf("Using Idempotency-Key: %s", idemKey)

	// Use a WaitGroup to fire off requests concurrently
	var wg sync.WaitGroup
	numRequests := 3
	wg.Add(numRequests)

	for i := 0; i < numRequests; i++ {
		go func(reqNum int) {
			defer wg.Done()
			time.Sleep(time.Duration(reqNum*10) * time.Millisecond) // Stagger requests slightly

			reqBody := `{"amount": 100.00}`
			req := httptest.NewRequest("POST", "/payments", strings.NewReader(reqBody))
			req.Header.Set("Idempotency-Key", idemKey)
			rec := httptest.NewRecorder()

			authedHandler.ServeHTTP(rec, req)

			respBody, _ := io.ReadAll(rec.Body)
			log.Printf("Request %d -> Status: %d, Body: %s", reqNum+1, rec.Code, string(respBody))
		}(i)
	}

	wg.Wait()

	log.Println("\n--- Sending another request after a delay to show caching ---")
	time.Sleep(1 * time.Second)

	reqBody := `{"amount": 100.00}`
	req := httptest.NewRequest("POST", "/payments", strings.NewReader(reqBody))
	req.Header.Set("Idempotency-Key", idemKey)
	rec := httptest.NewRecorder()
	authedHandler.ServeHTTP(rec, req)
	respBody, _ := io.ReadAll(rec.Body)
	log.Printf("Cached Request -> Status: %d, Body: %s", rec.Code, string(respBody))
}

Dissecting the Atomic Lua Script

The most critical piece of this implementation is the atomic nature of checking for the key's existence and setting the PENDING state. The Lua script executed by Redis's EVAL command guarantees this atomicity.

lua

-- KEYS[1] = the idempotency cache key
-- ARGV[1] = the 'PENDING' string
-- ARGV[2] = the TTL for the pending state

local current_val = redis.call('get', KEYS[1])
if current_val == false then
    -- Key doesn't exist. Atomically set it to PENDING with a TTL.
    redis.call('set', KEYS[1], ARGV[1], 'EX', ARGV[2])
    return 'OK_PROCEED' -- Signal to the Go code to proceed.
else
    -- Key exists. Return its current value.
    -- It will either be 'PENDING' or a full JSON response.
    return current_val
end

Because Redis is single-threaded, this script will execute without interruption. It completely eliminates the race condition where two requests could simultaneously see the key as non-existent. One will win, set the PENDING key, and get the OK_PROCEED response. The second will execute nanoseconds later, see the PENDING key, and return it, leading to the 409 Conflict response in our Go middleware.

Advanced Edge Cases and Production Hardening

A robust system is defined by how it handles failure. Let's explore the edge cases and how our design must account for them.

Edge Case 1: Gateway Crash After Forwarding, Before Caching

This is the most complex failure mode. The gateway has set the PENDING key and forwarded the request. The downstream service successfully processes it. Before the gateway can receive the response and update the Redis key, the gateway instance crashes.

Solution: The PENDING key's short TTL is our safety net. Let's say we set it to 15 seconds. If the client retries the request after the original gateway has crashed (and traffic has failed over to a new instance), one of two things will happen:

Retry within 15 seconds: The new gateway instance will see the PENDING key and return a 409 Conflict. This is the correct behavior, preventing a duplicate request while the original might still be processing or its response is in flight.

Retry after 15 seconds: The PENDING key will have expired. The new gateway instance will treat this as a novel request, set a new PENDING key, and forward it downstream.

This scenario reveals a crucial truth: gateway-level idempotency reduces the probability of duplicate processing but does not eliminate it entirely. In this specific failure mode, a duplicate request can reach the downstream service. Therefore, for mission-critical services like payment processing, it is still best practice to implement a final layer of idempotency protection within the service itself (e.g., a unique constraint on a transaction ID in the database). The gateway acts as a powerful, highly effective shield that handles >99.9% of cases, dramatically simplifying the downstream logic.

Edge Case 2: Downstream Service Timeout or Error

What should the gateway cache if the downstream service returns a 503 Service Unavailable or the request times out (resulting in a 504 Gateway Timeout)?

Solution: The gateway should cache the error response just like it would a success response. If a client retries with the same Idempotency-Key, it will immediately receive the cached 503 or 504 response. This is a form of circuit breaking; it prevents clients from hammering a struggling downstream service with retries for the same failed operation. The client must generate a new Idempotency-Key to attempt the operation again after a backoff period.

Our processAndCache function already handles this correctly, as it caches whatever status code and body it receives from the downstream handler.

Edge Case 3: Redis Unavailability or Cache Eviction

If Redis is down, the idempotency layer fails open. Our script.Run call will return an error, and the middleware will respond with a 500 Internal Server Error, preventing requests from proceeding. This is a safe failure mode.

More subtle is cache eviction. If Redis is under memory pressure and configured with an eviction policy like volatile-lru, it might evict our idempotency key before its 24-hour TTL expires. If a client then retries the request, the gateway will see the key as missing and process the request again.

Solution: This is generally considered an acceptable trade-off. The window for a duplicate request is small, and the system gracefully degrades to at-least-once behavior. For robust production environments, you should:

* Run a dedicated Redis cluster for critical functions like idempotency.

* Monitor Redis memory usage closely and scale up before eviction becomes common.

* Configure a volatile-ttl eviction policy to prioritize evicting keys closer to their natural expiration.

Performance and Scalability Considerations

Introducing this middleware adds latency to every mutable request. Let's quantify it.

* First Request: Incurs two Redis round trips: one for the EVAL script (check-and-set pending) and one for the final SET (cache response). On a low-latency network (e.g., gateway and Redis in the same VPC/availability zone), this might add 1-2ms of overhead.

* Duplicate Request (In-flight): One Redis EVAL round trip. Very fast, <1ms.

* Duplicate Request (Cached): One Redis EVAL round trip. Also very fast, <1ms, plus the time to transmit the cached response body.

This overhead is typically negligible compared to the processing time of the downstream service and database calls.

Scalability: The bottleneck will be Redis. A single Redis instance can handle tens of thousands of operations per second, which is often sufficient. For extreme scale, you can:

* Use Redis Cluster to shard the idempotency keys across multiple nodes, providing horizontal scalability.

* Ensure your cache key construction (idem:USER_ID:IDEMPOTENCY_KEY) provides good key distribution to avoid hot shards.

Payload Size: Caching large response bodies (megabytes) in Redis can consume memory quickly. You should be selective about which endpoints use this middleware. It's most valuable for endpoints that trigger critical, non-idempotent business logic. Consider adding a check to not cache responses over a certain size threshold (e.g., 1MB).

Conclusion: A Pragmatic Pattern for Resilient Systems

Implementing idempotency at the API Gateway is a powerful, advanced architectural pattern that provides a centralized, consistent, and language-agnostic solution to the problem of duplicate requests in distributed systems. By leveraging the atomic operations of a distributed cache like Redis, we can build a high-performance shield that protects our downstream services from the side effects of client retries and network instability.

While not a silver bullet that completely absolves downstream services from considering idempotency, it handles the vast majority of cases, drastically simplifying service-level logic and improving the overall resilience and predictability of the system. The Go middleware and Lua script presented here provide a production-ready blueprint for senior engineers tasked with building robust, scalable, and fault-tolerant microservice architectures.