Distributed Rate Limiting with Redis and Lua: A Sliding Window Log Deep Dive

16 min read
Goh Ling Yong
Technology enthusiast and software architect specializing in AI-driven development tools and modern software engineering practices. Passionate about the intersection of artificial intelligence and human creativity in building tomorrow's digital solutions.

The Fallacy of Simple Rate Limiters in Distributed Systems

In a monolithic application, implementing a rate limiter can be a deceptively simple task. An in-memory concurrent dictionary or a leaky bucket implementation often suffices. However, in a modern microservices architecture, where multiple stateless instances of a service run behind a load balancer, these naive approaches fail catastrophically. State must be externalized to be shared across all instances, and this is where many implementations introduce subtle but critical flaws.

A common first attempt at a distributed rate limiter involves a simple Redis key with INCR and EXPIRE.

python
# DO NOT USE THIS IN PRODUCTION - FLAWED FIXED-WINDOW APPROACH
import redis
import time

def is_rate_limited_flawed(r: redis.Redis, user_id: str, limit: int, window: int):
    key = f"rate_limit:{user_id}"
    current_count = r.get(key)

    if current_count and int(current_count) >= limit:
        return True

    # Potential race condition here!
    p = r.pipeline()
    p.incr(key)
    p.expire(key, window)
    p.execute()

    return False

This "Fixed Window Counter" approach has two major problems:

  • Race Conditions: The GET and INCR/EXPIRE operations are not atomic. Two concurrent requests from the same user could both read a count of N-1, both increment, and both be allowed, exceeding the rate limit of N.
  • Window Edge Bursts: The primary algorithmic flaw is that it allows for request bursts at the boundary of time windows. If the window is 60 seconds and a user makes N requests at second 59, they can immediately make another N requests at second 61 of the next window. This results in 2N requests in just two seconds, defeating the purpose of a smooth rate limit.
  • To build a robust, production-grade system, we need an algorithm that offers high precision and an implementation that guarantees atomicity. This is where the Sliding Window Log algorithm, implemented with Redis Sorted Sets and executed via a Lua script, becomes the superior engineering choice.

    Algorithm Deep Dive: The Sliding Window Log with Sorted Sets

    The Sliding Window Log algorithm is conceptually simple: for each user, store a timestamp for every request made within the current time window. To check if a new request is allowed, you first discard all timestamps older than the window, then count the remaining timestamps. If the count is less than the limit, the new request is allowed, and its timestamp is added to the log.

    A naive implementation might use a Redis List, but this would require fetching the entire list, filtering, and counting, which is inefficient (O(N) where N is the number of requests in the window).

    A far more efficient and elegant solution uses Redis's Sorted Sets. A sorted set stores members with associated scores. We can use the request timestamp (as a high-precision Unix timestamp) for both the member and the score. This structure is ideal because Redis provides highly optimized commands for operating on scores (timestamps).

    Our logic, to be executed atomically, will be:

  • Identify the key: rate_limit:{user_id}.
  • Get current time: Use Redis's internal clock for a single source of truth, avoiding clock skew issues between application servers. The TIME command returns [unix_seconds, microseconds]. We'll combine them for a high-precision timestamp.
  • Define the window boundary: current_time - window_size.
  • Prune the log: Remove all members from the sorted set with a score (timestamp) older than the window boundary. This is the "sliding" action. ZREMRANGEBYSCORE does this efficiently.
  • Count remaining requests: Get the cardinality of the sorted set after pruning. ZCARD provides this in O(1) time.
  • Make the decision: If the count is less than the limit, allow the request.
  • Log the new request: If allowed, add the current timestamp as a new member to the sorted set. ZADD is O(log(N)).
  • Set key expiry: To prevent memory leaks from inactive users, set an EXPIRE on the sorted set key itself, slightly longer than the window size.
  • Crucially, these steps must be performed as a single, atomic operation to prevent race conditions. A MULTI/EXEC transaction is insufficient because the logic in steps 5 and 6 depends on the result of step 4. This is the perfect use case for a server-side Lua script.

    The Core Implementation: An Atomic Lua Script

    Executing logic on the Redis server via Lua scripting is the key to both atomicity and performance. It eliminates network round-trip latency between dependent commands and ensures that no other command can interfere mid-execution.

    Here is the complete, production-ready Lua script.

    lua
    -- KEYS[1]: The key for the sorted set (e.g., "rate_limit:user123")
    -- ARGV[1]: The rate limit (e.g., 100)
    -- ARGV[2]: The window size in seconds (e.g., 60)
    
    -- Get the current time from Redis itself. TIME returns a table: {seconds, microseconds}
    local now = redis.call('TIME')
    local current_timestamp_us = (tonumber(now[1]) * 1000000) + tonumber(now[2])
    
    -- The start of our sliding window
    local window_start_us = current_timestamp_us - (tonumber(ARGV[2]) * 1000000)
    
    -- 1. Prune old entries from the sorted set.
    -- ZREMRANGEBYSCORE is efficient, removing all members with scores between -inf and window_start_us.
    redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', window_start_us)
    
    -- 2. Get the current count of requests in the window.
    -- ZCARD is O(1) and gives the number of elements in the sorted set.
    local current_count = redis.call('ZCARD', KEYS[1])
    
    local limit = tonumber(ARGV[1])
    local allowed = 0
    
    if current_count < limit then
        -- 3. If under the limit, add the new request timestamp to the log.
        -- We use the same value for both member and score for simplicity.
        redis.call('ZADD', KEYS[1], current_timestamp_us, current_timestamp_us)
        allowed = 1
    end
    
    -- 4. Set an expiry on the key itself to auto-clean up inactive users.
    -- Set it slightly longer than the window to avoid premature expiry.
    redis.call('EXPIRE', KEYS[1], ARGV[2] + 5)
    
    -- Return a table: {is_allowed (1 or 0), current_count_after_operation}
    -- This gives the client useful information for logging or headers (e.g., X-RateLimit-Remaining).
    return { allowed, current_count + (allowed == 1 and 1 or 0) }
    

    Dissecting the Script's Power-Features:

    * redis.call('TIME'): By using the Redis server's time, we create a single source of truth. This completely mitigates clock skew problems between distributed application servers, a common and difficult-to-debug issue.

    * Microsecond Precision: We convert everything to microseconds. While often overkill, it ensures correctness for very short time windows and high-throughput systems, preventing timestamp collisions.

    * ZREMRANGEBYSCORE: This is the workhorse of the sliding window. Its time complexity is O(log(N) + M) where N is the total number of elements and M is the number of elements being removed. For our use case, M is typically small, making this extremely fast.

    * ZCARD: An O(1) operation to get the size of the set. This is far superior to fetching and counting elements on the client.

    * Atomic ZADD: The decision and the addition of the new request happen within the same script, eliminating any possibility of a race condition.

    * Informative Return Value: Returning both the allowed status and the new_count allows the client to populate X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset headers without a second call to Redis.

    Production-Grade Client Implementation (Python)

    Calling this Lua script from a client application requires a specific pattern for optimal performance. Instead of sending the entire script with every EVAL command, we load it into Redis once using SCRIPT LOAD, which returns a SHA1 hash. Subsequent calls can then use the much lighter EVALSHA command.

    Here is a robust Python class that encapsulates this logic.

    python
    import redis
    import time
    import threading
    from typing import Tuple
    
    class RedisRateLimiter:
        """
        A production-grade distributed rate limiter using a sliding window log
        algorithm implemented with a Redis Lua script for atomicity.
        """
        LUA_SCRIPT = """
        local now = redis.call('TIME')
        local current_timestamp_us = (tonumber(now[1]) * 1000000) + tonumber(now[2])
        local window_start_us = current_timestamp_us - (tonumber(ARGV[2]) * 1000000)
        redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', window_start_us)
        local current_count = redis.call('ZCARD', KEYS[1])
        local limit = tonumber(ARGV[1])
        local allowed = 0
        if current_count < limit then
            redis.call('ZADD', KEYS[1], current_timestamp_us, current_timestamp_us)
            allowed = 1
        end
        redis.call('EXPIRE', KEYS[1], ARGV[2] + 5)
        return { allowed, current_count + (allowed == 1 and 1 or 0) }
        """
    
        def __init__(self, redis_client: redis.Redis):
            self.redis = redis_client
            try:
                # Use a lock to prevent multiple threads/processes from loading the script simultaneously
                with self.redis.lock("script_load_lock", timeout=5):
                    # Check if script is already loaded to avoid redundant loads
                    if not self.redis.get("rate_limiter_script_sha"):
                        sha = self.redis.script_load(self.LUA_SCRIPT)
                        self.redis.set("rate_limiter_script_sha", sha)
                        self.script_sha = sha.decode('utf-8')
                    else:
                        self.script_sha = self.redis.get("rate_limiter_script_sha").decode('utf-8')
            except redis.exceptions.LockError:
                # If lock fails, another process is likely loading it. Wait and fetch.
                time.sleep(0.1)
                self.script_sha = self.redis.get("rate_limiter_script_sha").decode('utf-8')
    
        def is_allowed(self, resource_id: str, limit: int, window: int) -> Tuple[bool, int]:
            """
            Checks if a request for a given resource is allowed.
    
            Args:
                resource_id: A unique identifier for the resource being rate-limited (e.g., user_id, ip_address).
                limit: The maximum number of requests allowed in the window.
                window: The time window in seconds.
    
            Returns:
                A tuple of (bool: allowed, int: current_count).
            """
            key = f"rate_limit:{resource_id}"
            try:
                result = self.redis.evalsha(self.script_sha, 1, key, limit, window)
                allowed, current_count = bool(result[0]), int(result[1])
                return allowed, current_count
            except redis.exceptions.NoScriptError:
                # The script was flushed from Redis (e.g., after SCRIPT FLUSH or restart).
                # Reload it and retry the command once.
                print("NoScriptError detected. Reloading script and retrying...")
                self.script_sha = self.redis.script_load(self.LUA_SCRIPT).decode('utf-8')
                self.redis.set("rate_limiter_script_sha", self.script_sha)
                result = self.redis.evalsha(self.script_sha, 1, key, limit, window)
                allowed, current_count = bool(result[0]), int(result[1])
                return allowed, current_count
    
    # --- Example Usage ---
    def simulate_requests(limiter, user_id, limit, window):
        print(f"--- Simulating for {user_id} (Limit: {limit} req/{window}s) ---")
        for i in range(limit + 5):
            allowed, count = limiter.is_allowed(user_id, limit, window)
            status = 'ALLOWED' if allowed else 'DENIED'
            print(f"Request {i+1}: Status={status}, Current Count={count}")
            time.sleep(0.1)
    
    if __name__ == '__main__':
        r = redis.Redis(decode_responses=True)
        r.flushdb() # Clear DB for clean test
        
        limiter = RedisRateLimiter(r)
    
        # Scenario 1: Basic rate limiting
        simulate_requests(limiter, "user_A", 10, 5)
    
        # Scenario 2: Test sliding window effect
        print("\n--- Testing sliding window precision ---")
        user_b_limit = 5
        user_b_window = 2
        # Make 3 requests
        for _ in range(3):
            limiter.is_allowed("user_B", user_b_limit, user_b_window)
            time.sleep(0.5)
        
        print(f"Made 3 requests. Waiting {user_b_window + 0.1} seconds for window to slide...")
        time.sleep(user_b_window + 0.1)
    
        # The window has now fully slid past the initial requests. We should have a full quota again.
        simulate_requests(limiter, "user_B", user_b_limit, user_b_window)
    

    This client implementation includes several production-hardening features:

    * Script Hashing (EVALSHA): Minimizes network traffic and Redis CPU by sending only the SHA1 hash after the initial load.

    * NoScriptError Handling: Redis may flush scripts from its cache, especially after a restart. The try...except block gracefully handles this by reloading the script and retrying the request, making the system self-healing.

    * Initialization Locking: In a multi-threaded or multi-process environment (like a Gunicorn server), you want to prevent a thundering herd of workers all trying to load the same script on startup. Using a Redis distributed lock ensures only one instance performs the initial SCRIPT LOAD.

    Performance, Memory, and Scalability Considerations

    Time Complexity: The dominant operations in our script are ZREMRANGEBYSCORE and ZADD, both of which are O(log(N) + M). Since N is capped by our rate limit (e.g., 1000 requests per minute), the logarithmic factor is negligible. The performance is effectively constant for any sane rate limit, making this solution incredibly fast and scalable.

    Memory Usage: The memory footprint is a critical factor for capacity planning. Each entry in the sorted set (a request timestamp) consumes memory. A 64-bit integer timestamp takes 8 bytes. With Redis's internal data structure overhead, we can estimate roughly 16-24 bytes per request in the window.

    Calculation: (Number of users) (rate limit) * (bytes per entry)

    * Example: 1 million active users, each with a rate limit of 100 requests/minute.

    1,000,000 100 20 bytes = 2,000,000,000 bytes ≈ 2 GB

    This is a manageable memory footprint, but it highlights the importance of setting sane EXPIRE values on the keys to garbage collect data for inactive users.

    Redis Topology: For high-availability, this pattern works seamlessly with a Redis Sentinel or Redis Cluster setup. The client library handles routing commands to the primary node. Since this is a write-heavy operation, reads from replicas are not suitable. Latency is paramount; ensure your application servers and Redis cluster are co-located in the same cloud region and availability zone.

    Advanced Edge Cases and Mitigation Strategies

    1. What happens if Redis is down?

    This is a critical architectural decision. Your rate limiter is a single point of failure.

    * Fail-Open: If the Redis client raises a ConnectionError, you can choose to allow the request to proceed. This prioritizes availability over strict enforcement. It's often acceptable for non-critical APIs, but dangerous for APIs that protect costly resources.

    * Fail-Closed: The alternative is to deny the request. This prioritizes protection and system stability over availability. It's the safer default.

    Implementation with a Circuit Breaker:

    python
    # In the RedisRateLimiter class
    from redis.exceptions import RedisError
    
    def is_allowed(self, ...):
        try:
            # ... existing evalsha logic ...
        except RedisError as e:
            # Log the error for monitoring!
            print(f"CRITICAL: Redis connection error: {e}")
            # Implement fail-open or fail-closed strategy
            return False, -1 # Fail-closed
            # return True, -1 # Fail-open

    2. Throttling vs. Rate Limiting

    Our implementation currently rejects requests (429 Too Many Requests) once the limit is hit. An alternative is throttling, where requests are queued and processed with a delay. This is a much more complex system, often requiring a dedicated message queue (like RabbitMQ or SQS) and is outside the scope of a simple Redis-based limiter. For most API use cases, immediate rejection is the expected and correct behavior.

    3. Large Cardinality Keys & Hotspots

    If you are rate-limiting by a very high cardinality key (e.g., millions of unique user IDs), you won't have a single hot key in Redis. However, if you rate-limit by a low-cardinality key (e.g., a tenant ID in a multi-tenant system), a single tenant could generate a Redis hotspot. In such scenarios, if a single key is causing performance degradation, you may need to introduce application-level sharding. For example, instead of one key rate_limit:{tenant_id}, you could shard across multiple keys: rate_limit:{tenant_id}:1, rate_limit:{tenant_id}:2, etc., and deterministically hash the incoming request to one of the shards. This adds complexity but can distribute the load.

    Conclusion: A Blueprint for Resilient Control

    The distributed rate limiter is a foundational component of any scalable microservices architecture. By moving beyond naive fixed-window counters and embracing the precision of the Sliding Window Log algorithm, we can prevent bursty traffic and enforce limits accurately. The combination of Redis Sorted Sets for efficient data management and Lua scripting for unshakeable atomicity provides a powerful, performant, and resilient pattern. This implementation, complete with client-side hardening and a clear understanding of its performance characteristics and failure modes, serves as a blueprint for building a critical piece of infrastructure that is both robust and scalable.

    Found this article helpful?

    Share it with others who might benefit from it.

    More Articles