Design a distributed rate limiting system that can enforce per-account quotas across multiple API gateway instances globally.
Click "Show Solution" to reveal the approach. Try designing it yourself first!
Recommended Approach: Redis with Sliding Window
Architecture
┌──────────┐ ┌──────────┐ ┌──────────┐
│API GW 1 │ │API GW 2 │ │API GW 3 │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└─────────────────┼─────────────────┘
│
┌──────▼──────┐
│ Redis │
│ Cluster │
│ (Shared) │
└─────────────┘
Each gateway instance checks Redis before allowing request
Redis Data Structure: Sorted Set
For each account, store timestamps of recent requests:
- Key:
rate_limit:{account_id}:{window}
- Value: Sorted set with score = timestamp
- Example:
rate_limit:AC123:minute → {1700000001, 1700000002, ...}
Algorithm: Sliding Window Counter
function checkRateLimit(account_id, limit, window_seconds):
current_time = now()
window_start = current_time - window_seconds
# 1. Remove entries older than window
ZREMRANGEBYSCORE rate_limit:{account_id} -inf window_start
# 2. Count remaining entries
count = ZCARD rate_limit:{account_id}
# 3. Check if under limit
if count < limit:
# 4. Add current request
ZADD rate_limit:{account_id} current_time current_time
# 5. Set TTL to auto-expire old data
EXPIRE rate_limit:{account_id} window_seconds
return ALLOWED, remaining = (limit - count - 1)
else:
return DENIED, remaining = 0
Making it Atomic: Lua Script
Run all Redis commands in a single Lua script for atomicity:
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current_time = tonumber(ARGV[3])
local window_start = current_time - window
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, current_time, current_time)
redis.call('EXPIRE', key, window)
return {1, limit - count - 1} -- allowed, remaining
else
return {0, 0} -- denied, remaining
end
Response Headers
Include in every API response:
X-RateLimit-Limit: 1000 - Total limit
X-RateLimit-Remaining: 742 - Requests remaining
X-RateLimit-Reset: 1700000060 - When window resets
Handling Multiple Time Windows
Support multiple limits simultaneously:
- 1000 requests per minute
- 10,000 requests per hour
- 100,000 requests per day
Solution: Check each limit independently, deny if ANY limit exceeded
Scaling Redis
- Redis Cluster: Partition by account_id (consistent hashing)
- Replication: Redis replica for read availability
- Fallback: If Redis is down, fail open or use local in-memory limit (less accurate)
Alternative Approaches
| Approach |
Pros |
Cons |
| Fixed Window Counter |
Simple, low memory |
Burst at window edges (2x limit possible) |
| Token Bucket |
Allows bursts, smooth |
More complex state |
| Sliding Window (chosen) |
Accurate, no edge bursts |
Higher memory (stores timestamps) |
| Leaky Bucket |
Enforces strict rate |
No bursting, complex |
Trade-offs Discussion
Accuracy vs Performance
- Sliding window is most accurate but requires sorted set (more memory)
- Fixed window uses single counter (less memory) but allows edge bursts
- Decision: Accuracy matters for billing - use sliding window
Centralized (Redis) vs Distributed (local)
- Redis: Accurate but adds dependency and latency
- Local: Fast but inaccurate in distributed system (each instance tracks separately)
- Decision: Use Redis with aggressive caching
Fail Open vs Fail Closed
- If Redis is down:
- Fail open: Allow all requests (better availability, risk of abuse)
- Fail closed: Deny all requests (worse availability, safe)
- Decision: Fail open with local rate limiting as backup
Interview Talking Points
- "Rate limiting is about fairness and protection" - Prevent one customer from consuming all capacity
- "Choose algorithm based on requirements" - If bursting is okay, use token bucket. If strict limit needed, use leaky bucket or sliding window
- "Graceful degradation" - If centralized rate limiter fails, fall back to local limits
- "Observability" - Track rate limit denials as a metric - might indicate customer needs to upgrade tier