Distributed Rate Limiting with Redis Lua Scripts & Sorted Sets

15 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Fallacy of Naive Rate Limiting in Distributed Systems

In a distributed environment, building a reliable rate limiter is a canonical problem that separates foundational knowledge from production-grade engineering. A simple INCR and EXPIRE approach on a Redis key, while tempting in its simplicity, is fundamentally flawed. This fixed-window counter method suffers from race conditions at the window's edge, allowing a client to burst twice the allowed requests in a short period—once at the end of window N and again at the start of window N+1.

Senior engineers recognize this and often reach for more precise algorithms like the token bucket, leaky bucket, or sliding window. Of these, the sliding window log offers an exceptional balance of precision, performance, and relative implementation simplicity when paired with the power of Redis. However, even this approach has a critical flaw if implemented naively: a series of separate Redis commands to manage the window is not atomic. Network latency between commands opens a window for race conditions, corrupting the limiter's state.

This article bypasses introductory concepts and dives directly into the definitive solution: implementing a high-precision sliding window log rate limiter using Redis Sorted Sets, with atomicity guaranteed by a server-side Lua script. We will dissect the algorithm, build a production-ready implementation in Go, analyze its performance characteristics, and address the complex edge cases encountered in real-world, high-traffic systems.


Algorithm Deep Dive: The Sliding Window Log with Sorted Sets

The core idea of the sliding window log is to maintain a timestamped log of all requests for a given key within the current time window. When a new request arrives, we first discard all logs with timestamps older than the window, then check if the remaining log count is below the configured limit.

Redis's Sorted Set is the perfect data structure for this task. Here's how we map the algorithm's components to a Sorted Set:

* Key: A unique identifier for the entity being rate-limited (e.g., rate_limit:user_id:12345 or rate_limit:ip:192.168.1.100).

* Score: The timestamp of each request, ideally in microseconds for high resolution. The score allows Redis to keep the set ordered by time, which is crucial for efficient pruning.

* Member: A unique value for each request to prevent collisions within the same microsecond. A combination of the timestamp and a random nonce is a robust strategy.

Let's visualize the process for a limit of 100 requests per 60 seconds:

  • Request Arrives: A request from user 12345 arrives at t_current.
  • Calculate Window Start: The start of the sliding window is t_current - 60_000_000 (in microseconds).
  • Prune Old Entries: We issue a ZREMRANGEBYSCORE command on the key rate_limit:user_id:12345. This command efficiently removes all members whose score (timestamp) is less than the calculated window start. This is the "sliding" part of the algorithm.
  • Count Current Requests: We use ZCARD to get the number of members remaining in the sorted set. This count represents the number of requests within the current 60-second window.
  • Make Decision:
  • * If the count is less than 100, the request is allowed.

    * We then add the new request's timestamp to the sorted set using ZADD.

    * The request proceeds.

    * If the count is 100 or more, the request is denied (HTTP 429 Too Many Requests).

    The Atomicity Problem

    The sequence of operations—ZREMRANGEBYSCORE, ZCARD, ZADD—is the crux of the problem. If executed as three separate network calls from an application server, a race condition can occur. Imagine two concurrent requests from the same user:

    * Request A executes ZREMRANGEBYSCORE.

    * Request B executes ZREMRANGEBYSCORE.

    * Request A executes ZCARD and sees the count is 99.

    * Request B executes ZCARD and also sees the count is 99.

    * Request A decides to allow the request and executes ZADD.

    * Request B also decides to allow the request and executes ZADD.

    The result: we now have 101 requests in the log, violating our rate limit. The only way to solve this is to execute the entire sequence as a single, atomic operation on the Redis server. This is precisely what Lua scripting is designed for.


    Crafting the Atomic Lua Script

    Redis allows the execution of server-side Lua scripts. A script is sent to the server, which executes it as a single, uninterruptible command. No other Redis command can run while a script is executing, guaranteeing atomicity.

    Here is the complete Lua script to implement our sliding window log rate limiter. We will then break it down line by line.

    lua
    -- KEYS[1]: The key for the sorted set (e.g., 'rate_limit:user123')
    -- ARGV[1]: The maximum number of requests allowed (the limit)
    -- ARGV[2]: The window size in seconds
    -- ARGV[3]: The current time in microseconds (server-provided for clock sync)
    -- ARGV[4]: A unique identifier for the current request (e.g., timestamp + nonce)
    
    local key = KEYS[1]
    local limit = tonumber(ARGV[1])
    local window_seconds = tonumber(ARGV[2])
    local current_time_us = tonumber(ARGV[3])
    local request_id = ARGV[4]
    
    -- Calculate the start of the window in microseconds
    local window_start_us = current_time_us - (window_seconds * 1000000)
    
    -- 1. Prune the sorted set to remove entries outside the current window.
    -- ZREMRANGEBYSCORE is efficient for this task.
    redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start_us)
    
    -- 2. Get the current count of requests within the window.
    -- ZCARD provides the size of the sorted set in O(1) time.
    local current_count = redis.call('ZCARD', key)
    
    local allowed = 0
    if current_count < limit then
        -- 3. If within limits, add the new request to the sorted set.
        -- The score is the current time, and the member is the unique request ID.
        redis.call('ZADD', key, current_time_us, request_id)
        
        -- 4. Set an expiration on the key to automatically clean it up after it's been idle.
        -- This prevents old, inactive keys from consuming memory forever.
        -- The expiration is set to the window size, ensuring it lives long enough.
        redis.call('EXPIRE', key, window_seconds)
        
        allowed = 1
    end
    
    -- Return a table: {is_allowed, new_count}
    return { allowed, current_count + allowed }

    Script Breakdown:

    * KEYS and ARGV: The script receives its parameters through these two tables. KEYS are for keys being accessed, which is important for Redis Cluster routing. ARGV is for all other arguments.

    * Parameter Handling: We assign the inputs to local variables and convert numerical arguments from strings to numbers using tonumber().

    * window_start_us: We calculate the cutoff timestamp. Any request logged before this time is expired.

    * redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start_us): This is the core of the pruning step. It removes all members from the sorted set at key whose scores are between negative infinity and window_start_us. This is highly efficient.

    redis.call('ZCARD', key): This gets the cardinality (size) of the set after* pruning. This is an O(1) operation, making it extremely fast.

    * Conditional Logic: The if current_count < limit then block contains the logic for an allowed request.

    * redis.call('ZADD', key, current_time_us, request_id): If the request is allowed, we add it to the log. The current_time_us is the score, and the request_id is the member.

    * redis.call('EXPIRE', key, window_seconds): This is a crucial memory management optimization. If a user stops making requests, their rate limit key would persist in Redis indefinitely. By setting an EXPIRE equal to the window size, we ensure that if the key is not touched for the duration of the window, Redis will automatically delete it.

    Return Value: The script returns a table (which Redis clients typically convert to an array or list). The first element is a boolean-like integer (1 for allowed, 0 for denied). The second element is the new count after* this request, which is useful for returning X-RateLimit-Remaining headers.


    Production Implementation in Go

    Now, let's build a robust, reusable rate limiter in Go that utilizes this Lua script. We'll use the popular go-redis/redis library.

    The `RateLimiter` Struct

    First, we'll define a struct to encapsulate the Redis client and the Lua script's SHA1 hash. Using the hash (EVALSHA) is critical for performance, as it avoids re-sending the entire script to Redis for every single request.

    go
    package ratelimiter
    
    import (
    	"context"
    	"crypto/sha1"
    	"encoding/hex"
    	"fmt"
    	"time"
    
    	"github.com/go-redis/redis/v8"
    )
    
    const luaScript = `
        local key = KEYS[1]
        local limit = tonumber(ARGV[1])
        local window_seconds = tonumber(ARGV[2])
        local current_time_us = tonumber(ARGV[3])
        local request_id = ARGV[4]
    
        local window_start_us = current_time_us - (window_seconds * 1000000)
    
        redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start_us)
    
        local current_count = redis.call('ZCARD', key)
    
        local allowed = 0
        if current_count < limit then
            redis.call('ZADD', key, current_time_us, request_id)
            redis.call('EXPIRE', key, window_seconds)
            allowed = 1
        end
    
        return { allowed, current_count + allowed }
    `
    
    // Result holds the outcome of a rate limit check.
    type Result struct {
    	Allowed       bool
    	CurrentCount  int64
    	Limit         int64
    	WindowSeconds int64
    }
    
    // RedisRateLimiter implements a distributed rate limiter using Redis.	ype RedisRateLimiter struct {
    	client   *redis.Client
    	scriptSHA string
    }
    
    // NewRedisRateLimiter creates and initializes a new rate limiter.
    // It loads the Lua script into Redis and stores its SHA1 hash.
    func NewRedisRateLimiter(client *redis.Client) (*RedisRateLimiter, error) {
    	hasher := sha1.New()
    	hasher.Write([]byte(luaScript))
    	scriptSHA := hex.EncodeToString(hasher.Sum(nil))
    
    	// Use SCRIPT LOAD to cache the script in Redis.
    	// This is idempotent, so it's safe to call on startup.
    	loaded, err := client.ScriptLoad(context.Background(), luaScript).Result()
    	if err != nil {
    		return nil, fmt.Errorf("failed to load redis script: %w", err)
    	}
    
    	if loaded != scriptSHA {
    		return nil, fmt.Errorf("script SHA mismatch: expected %s, got %s", scriptSHA, loaded)
    	}
    
    	return &RedisRateLimiter{
    		client:   client,
    		scriptSHA: scriptSHA,
    	}, nil
    }

    The `Allow` Method

    This method executes the script and parses the result.

    go
    // Allow checks if a request is permitted for a given key.
    func (r *RedisRateLimiter) Allow(ctx context.Context, key string, limit int64, window time.Duration) (*Result, error) {
    	// Generate a unique ID for this specific request.
    	// A high-resolution timestamp is a good basis.
    	now_us := time.Now().UnixNano() / 1000
    	requestID := fmt.Sprintf("%d", now_us) // Can add a nonce for more uniqueness
    
    	windowSeconds := int64(window.Seconds())
    
    	// Use EVALSHA to execute the pre-loaded script.
    	// This is much more efficient than sending the script every time.
    	args := []interface{}{limit, windowSeconds, now_us, requestID}
    	
    	res, err := r.client.EvalSha(ctx, r.scriptSHA, []string{key}, args...).Result()
    	if err != nil {
    		// If the script is not found (e.g., Redis was flushed), fall back to EVAL.
    		// The go-redis library handles NOSCRIPT errors automatically by resending with EVAL.
    		return nil, fmt.Errorf("failed to execute redis script: %w", err)
    	}
    
    	resultSlice, ok := res.([]interface{})
    	if !ok || len(resultSlice) != 2 {
    		return nil, fmt.Errorf("unexpected result format from redis script: %v", res)
    	}
    
    	allowed, ok1 := resultSlice[0].(int64)
    	newCount, ok2 := resultSlice[1].(int64)
    	if !ok1 || !ok2 {
    		return nil, fmt.Errorf("unexpected type in result slice: %v", resultSlice)
    	}
    
    	return &Result{
    		Allowed:       allowed == 1,
    		CurrentCount:  newCount,
    		Limit:         limit,
    		WindowSeconds: windowSeconds,
    	}, nil
    }

    Usage in an HTTP Middleware

    Here’s how to integrate the rate limiter into a standard net/http middleware.

    go
    package main
    
    import (
    	"context"
    	"fmt"
    	"log"
    	"net/http"
    	"time"
    
    	"github.com/go-redis/redis/v8"
    	"your_project/ratelimiter"
    )
    
    func rateLimitMiddleware(limiter *ratelimiter.RedisRateLimiter, next http.Handler) http.Handler {
    	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    		// Example: Rate limit by IP address
    		ip := r.RemoteAddr
    		key := fmt.Sprintf("rate_limit:ip:%s", ip)
    		
    		// Limit: 10 requests per minute
    		limit := int64(10)
    		window := 1 * time.Minute
    
    		result, err := limiter.Allow(r.Context(), key, limit, window)
    		if err != nil {
    			// In a production scenario, you might 'fail open' or 'fail closed'
    			// depending on system requirements. Here we log and fail open.
    			log.Printf("Rate limiter error: %v. Allowing request.", err)
    			next.ServeHTTP(w, r)
    			return
    		}
    
    		// Set informative headers
    		w.Header().Set("X-RateLimit-Limit", fmt.Sprintf("%d", result.Limit))
    		w.Header().Set("X-RateLimit-Remaining", fmt.Sprintf("%d", result.Limit - result.CurrentCount))
    
    		if !result.Allowed {
    			w.WriteHeader(http.StatusTooManyRequests)
    			w.Write([]byte("Too Many Requests"))
    			return
    		}
    
    		next.ServeHTTP(w, r)
    	})
    }
    
    func main() {
    	redisClient := redis.NewClient(&redis.Options{
    		Addr: "localhost:6379",
    	})
    
    	if err := redisClient.Ping(context.Background()).Err(); err != nil {
    		log.Fatalf("Could not connect to Redis: %v", err)
    	}
    
    	l, err := ratelimiter.NewRedisRateLimiter(redisClient)
    	if err != nil {
    		log.Fatalf("Could not create rate limiter: %v", err)
    	}
    
    	finalHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    		w.Write([]byte("Hello, World!"))
    	})
    
    	http.Handle("/", rateLimitMiddleware(l, finalHandler))
    
    	log.Println("Server starting on :8080")
    	if err := http.ListenAndServe(":8080", nil); err != nil {
    		log.Fatal(err)
    	}
    }

    Performance Analysis and Optimization

    Understanding the performance characteristics of this implementation is non-negotiable for production deployment.

    `EVAL` vs. `EVALSHA`

    As implemented in our Go example, always prefer EVALSHA.

    * EVAL: The client sends the entire Lua script string to Redis with every call. For a script of a few hundred bytes, this adds network overhead and requires Redis to re-parse the script each time.

    * SCRIPT LOAD + EVALSHA: On application startup, the script is sent once with SCRIPT LOAD. Redis parses it, caches it, and returns its SHA1 hash. Subsequent calls use EVALSHA with just the hash. This significantly reduces network bandwidth and CPU load on the Redis server.

    Benchmark: In a typical cloud environment (e.g., us-east-1), the latency difference can be stark. A local test might show a small difference, but over a real network, sending 40 bytes (SHA1 hash) vs. 500+ bytes (script) for every single API call adds up. At 10,000 requests per second, you save nearly 5MB/s of network traffic just on the script payload.

    Time Complexity

    Let's analyze the Redis commands inside the script:

    * ZREMRANGEBYSCORE: O(log(N) + M), where N is the number of elements in the sorted set and M is the number of elements being removed. In our case, N is capped by the rate limit, and M is the number of expired requests. For a stable workload, M is small, making this very fast.

    * ZCARD: O(1). Constant time, extremely fast.

    * ZADD: O(log(N)). Efficiently adds the new element.

    * EXPIRE: O(1).

    The overall complexity is dominated by ZREMRANGEBYSCORE. Since N is effectively bounded by our rate limit, the performance is excellent and does not degrade as more requests are processed over time. For a limit of 1000 requests/minute, N will never exceed 1000, and log(1000) is a very small number.

    Memory Usage

    The primary drawback of the sliding window log is its memory footprint. Each request stores a member and a score in the sorted set. Assuming a 64-bit timestamp for the score and a 64-bit unique ID for the member:

    * Memory per request: (8 bytes for score) + (8 bytes for member) + (overhead for sorted set node) which is roughly 40-50 bytes per request.

    For a limit of 1000 requests, the key will hold at most 1000 entries: 1000 50 bytes = ~50 KB.

    If you are rate-limiting 1 million unique users, the potential memory usage is 1,000,000 50 KB = 50 GB, which is substantial.

    The EXPIRE command is our primary defense against unbounded memory growth from inactive users. For systems with a very large number of distinct rate-limited entities, consider:

    • Using shorter windows where feasible.
    • Applying this high-precision limiter only to critical, authenticated endpoints.
    • Using a less memory-intensive algorithm (like the fixed window counter) for less critical public endpoints.

    Handling Advanced Edge Cases

    Production systems are defined by how they handle failure and edge cases.

    Clock Skew

    If multiple application servers are calling the rate limiter, their system clocks may not be perfectly synchronized. If each server generated its own current_time, one server's clock being slightly ahead could cause it to prematurely expire requests logged by another server.

    Solution: Our script elegantly solves this. The application server generates the timestamp (time.Now().UnixNano() / 1000) and passes it as an argument to the Lua script. The single Redis server then uses this consistent timestamp for all operations within that atomic execution (window_start_us calculation and ZADD score). This ensures that the view of time for a single atomic check is internally consistent, neutralizing the primary risk of clock skew.

    Redis Cluster

    This pattern is fully compatible with Redis Cluster. Redis Cluster shards data based on key hashes. All operations in a Lua script must operate on keys that hash to the same slot. Since our script only ever operates on a single key (KEYS[1]), this requirement is met by design. The client library, when in cluster mode, will correctly route the EVALSHA command to the node that owns the hash slot for the given rate_limit:user_id:12345 key.

    Graceful Degradation (Redis Unavailability)

    What happens if Redis is down or experiencing high latency?

    * Fail-Closed: The most secure option. If the rate limiter returns an error, deny the request (HTTP 500 or 429). This protects your upstream services from being overloaded but can impact availability.

    * Fail-Open: If a brief period of exceeding limits is acceptable, you can allow the request to proceed. This prioritizes availability over strict enforcement. Our middleware example does this by logging the error and calling next.ServeHTTP.

    A hybrid approach is often best: implement a secondary, in-memory rate limiter on each application node. If the Redis call fails, fall back to the local in-memory limiter. This provides a degree of protection even during a Redis outage, though it's no longer globally consistent.


    Conclusion

    The combination of Redis Sorted Sets and Lua scripting provides a powerful, precise, and high-performance foundation for a distributed rate limiter. By moving the core logic onto the Redis server, we achieve the critical atomicity that naive implementations lack, eliminating race conditions. The Go implementation demonstrates how to encapsulate this logic into a reusable component, optimize for performance with EVALSHA, and integrate it cleanly into a web service.

    While the memory footprint requires careful consideration in systems with millions of unique limited entities, the algorithm's performance characteristics and correctness make it a superior choice for protecting critical APIs in any modern, distributed architecture.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles