Twilio System Design Scenarios

Practice system design questions specifically tailored to Twilio's domain. Click on "Show Solution" to reveal the approach.

How to Approach System Design Interviews

Interview Framework (URDAD)

  1. Understand & Clarify Requirements
    • Ask about scale (requests/sec, data size)
    • Clarify functional requirements
    • Understand non-functional requirements (latency, availability, consistency)
    • Identify constraints (budget, technology, timeline)
  2. Requirements → API Design
    • Define APIs (REST, GraphQL, gRPC)
    • Show request/response formats
  3. Data Model
    • What data to store
    • Storage technology choices
    • Schema design
  4. Architecture & Components
    • High-level architecture diagram
    • Component responsibilities
    • Data flow
  5. Deep Dive
    • Focus on interesting/complex parts
    • Discuss trade-offs
    • Address failure scenarios

Scenario 1: SMS Delivery Pipeline

Medium
Design Twilio's SMS delivery system that can handle 1 million messages per second globally with 99.95% availability.
Functional Requirements
  • Accept SMS messages via REST API
  • Validate phone numbers and message content
  • Route messages to appropriate carriers
  • Track delivery status (queued, sent, delivered, failed)
  • Provide webhook callbacks for status updates
  • Support message retry on failures
Non-Functional Requirements
  • Scale: 1M messages/second peak, 500K sustained
  • Latency: < 500ms API response time p99
  • Availability: 99.95% (4.4 hours downtime/year)
  • Durability: Once accepted, message must not be lost
  • Consistency: Eventual consistency acceptable for status
High-Level Architecture
┌─────────┐ ┌──────────────┐ ┌─────────────┐ │Customer │─────>│ API Gateway │─────>│ Kafka │ │ POST │<─────│ (Regional) │ │ (Events) │ └─────────┘ └──────────────┘ └──────┬──────┘ │ │ │ │ ┌────▼─────┐ ┌────▼──────┐ │DynamoDB │ │ Workers │ │(Metadata)│ │ (Fleet) │ └──────────┘ └─────┬─────┘ │ ┌─────▼──────┐ │ Carriers │ │ (AT&T, etc)│ └────────────┘

Component Design

1. API Gateway Layer

  • Technology: AWS API Gateway + Lambda or ECS/Fargate
  • Responsibilities:
    • Authentication (API key validation)
    • Rate limiting (per account)
    • Request validation
    • Idempotency key handling
    • Generate unique message SID
  • Scaling: Multi-region deployment, auto-scaling based on request rate

2. Message Queue (Kafka)

  • Why Kafka: High throughput, durability, ordering per partition
  • Partitioning: By message_sid for ordering guarantees
  • Topics:
    • messages.incoming - Newly accepted messages
    • messages.carrier-delivery - Ready to send to carrier
    • messages.status-updates - Delivery status changes
  • Retention: 7 days for replay capability

3. Worker Fleet

  • Technology: ECS/Kubernetes pods
  • Consumer groups: Parallel processing, rebalancing on failures
  • Responsibilities:
    • Consume from Kafka
    • Route to appropriate carrier based on destination
    • Handle carrier-specific protocols
    • Retry logic with exponential backoff
    • Publish status updates to Kafka
  • Circuit breakers: Per carrier to handle failures

4. Storage (DynamoDB)

  • Table: Messages
    • Partition key: account_id
    • Sort key: message_sid
    • Attributes: to, from, body, status, timestamps
    • GSI on message_sid for lookups
  • Consistency: Eventual consistent reads for status checks
  • Auto-scaling: On-demand capacity mode

5. Webhook Delivery

  • Separate worker fleet consuming from messages.status-updates
  • POST to customer webhook URL
  • Retry with exponential backoff: 1min, 5min, 30min, 1hr, 6hr
  • Circuit breaker per webhook URL
  • Dead letter queue for failed webhooks
Key Design Decisions
Decision Rationale Alternative Considered
Kafka over SQS Higher throughput, ordering guarantees, replay capability SQS simpler but lacks ordering, replay
DynamoDB over RDS Auto-scaling, low-latency, handles write-heavy workload RDS would need sharding at this scale
Async processing Decouple API response from carrier delivery for reliability Synchronous would timeout frequently
Regional deployment Low latency for customers in each region Single region = higher latency globally
What to Emphasize in Interview
  • At-least-once delivery with idempotency: "We design for at-least-once semantics with idempotency keys rather than expensive exactly-once"
  • Fault isolation: "Circuit breakers per carrier prevent cascading failures"
  • Durability: "Once API returns 201, message is persisted in Kafka - guaranteed delivery"
  • Observability: "Every state transition is an event - full audit trail"
  • Scalability: "Kafka partitions + worker auto-scaling handle traffic spikes"

Scenario 2: Multi-Region Active-Active Messaging

Hard
Design Twilio's messaging platform to run active-active in 3 regions (US, EU, APAC) where customers can send messages from any region and experience consistent behavior.
Functional Requirements
  • Accept messages in any region
  • Customers see consistent account state globally
  • Message history accessible from any region
  • Account balance/quota enforced globally
  • Survive full region failure
Constraints
  • Cross-region latency: 100-200ms
  • Each region should work independently during partition
  • No global locks or synchronous cross-region coordination
  • Data residency: EU customer data stays in EU
High-Level Architecture
┌──────────────────────────────────────────────────────────┐ │ GLOBAL ROUTING LAYER │ │ (DNS-based or Anycast IP routing) │ └────┬──────────────────────┬──────────────────────┬───────┘ │ │ │ ┌────▼────────┐ ┌───────▼────────┐ ┌──────▼────────┐ │ US Region │ │ EU Region │ │ APAC Region │ │ │ │ │ │ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │API GW │ │ │ │API GW │ │ │ │API GW │ │ │ └────┬────┘ │ │ └────┬────┘ │ │ └────┬────┘ │ │ │ │ │ │ │ │ │ │ │ ┌────▼────┐ │ │ ┌────▼────┐ │ │ ┌────▼────┐ │ │ │Kafka │ │◄───────►│Kafka │◄─────────►│Kafka │ │ │ └────┬────┘ │ │ └────┬────┘ │ │ └────┬────┘ │ │ │ │ │ │ │ │ │ │ │ ┌────▼────┐ │ │ ┌────▼────┐ │ │ ┌────▼────┐ │ │ │Workers │ │ │ │Workers │ │ │ │Workers │ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ │ │ │ │ │ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ ┌─────────┐ │ │ │DynamoDB │◄┼─────┼─►│DynamoDB │◄──┼─────┼─►│DynamoDB │ │ │ │(Global │ │ │ │(Global │ │ │ │(Global │ │ │ │ Tables) │ │ │ │ Tables) │ │ │ │ Tables) │ │ │ └─────────┘ │ │ └─────────┘ │ │ └─────────┘ │ └─────────────┘ └────────────────┘ └───────────────┘

Design Approach

1. Data Classification

Different data has different consistency needs:

Data Type Consistency Needed Storage Strategy
Message events Regional, eventual global Regional Kafka, async replication
Account metadata Global eventual DynamoDB Global Tables
Account balance/quota Strong consistency (avoid) Regional balance, periodic reconciliation
Message history Regional, async replicated Regional DynamoDB, cross-region replication

2. Routing Strategy

  • Home region per account: Each account has a primary region (set at signup or based on location)
  • Regional affinity: Route customer's API calls to their home region when possible
  • Graceful degradation: If home region is down, route to nearest healthy region
  • Data residency: EU accounts MUST have home region in EU (GDPR)

3. Handling Cross-Region Writes

When EU customer's request lands in US region:

  1. Option A (Proxy): US region proxies request to EU region, waits for response
    • Pro: Strong consistency
    • Con: Higher latency (200ms+ penalty)
  2. Option B (Local write + async reconcile): Accept in US, async replicate to EU
    • Pro: Low latency
    • Con: Potential conflicts, violates data residency
  3. Recommended: Option A for control plane, local writes for data plane

4. Quota/Balance Enforcement

Challenge: Can't do synchronous cross-region checks (too slow)

Solution: Regional quota allocation
  • Account has global limit: 10,000 messages/day
  • Allocate quota to each region: US=5000, EU=5000, APAC=0 (customer doesn't use APAC)
  • Each region enforces its local quota independently
  • Nightly reconciliation job redistributes unused quota
  • If region runs out, can request more from global coordinator (slower path)
Trade-off: Might reject messages even if global quota available (CAP theorem - choosing availability over perfect consistency)

5. Message History Queries

  • Messages primarily queried in the region they were sent
  • Each region maintains local message history
  • Async replication to other regions (DynamoDB Global Tables or custom)
  • If customer queries in non-home region: eventual consistency acceptable (might miss very recent messages)

6. Failure Scenarios

Scenario: EU region goes down
  • DNS/routing layer detects failure, stops routing to EU
  • EU customers routed to US region (nearest)
  • US region acts as backup for EU accounts
  • When EU recovers, catch up on replicated data, resume normal routing
Scenario: Network partition between regions
  • Each region continues operating independently (AP in CAP)
  • Quota enforcement becomes per-region (might over-deliver globally)
  • When partition heals, reconcile quota usage
  • If over-delivered, customer gets charged (better than losing messages)
Key Trade-offs Discussion

Why not single global database?

Would require synchronous cross-region writes → 200ms+ latency → unacceptable for messaging API.

Why not multi-master everywhere?

Conflict resolution is complex for messaging. "Last write wins" doesn't work for quota enforcement.

Chosen approach: Hybrid

  • Regional processing (fast, isolated)
  • Async replication (eventual global view)
  • Partitioned quota (local enforcement with global budget)
  • Graceful degradation (AP during partitions, reconcile later)
Interview Talking Points
  • "Active-active doesn't mean all data is global" - Partition data by access patterns
  • "Choose consistency model per data type" - Messages can be eventual, auth cannot
  • "Regional independence for resilience" - Each region should survive alone
  • "Cross-region latency is the enemy" - Design to minimize synchronous cross-region calls
  • "Reconcile, don't prevent" - Allow local decisions, fix conflicts async

Scenario 3: Cell-Based Architecture for Fault Isolation

Hard
Design a cell-based architecture for Twilio's messaging platform that provides fault isolation between customers while efficiently using resources.
Requirements
  • Enterprise customers should be isolated from each other
  • Small customers can share infrastructure
  • One customer's traffic spike shouldn't affect others
  • Cell failure should have bounded blast radius
  • Support gradual rollout of changes to minimize risk

This is YOUR wheelhouse! Align with your PayPal experience.

Cell Architecture
┌─────────────────────────────────────────────────────────┐ │ CONTROL PLANE (Global) │ │ - Account → Cell mapping │ │ - Cell health monitoring │ │ - Routing configuration │ └────────────────────┬────────────────────────────────────┘ │ ┌─────────────┼─────────────┬──────────────┐ │ │ │ │ ┌──────▼──────┐ ┌───▼────────┐ ┌──▼──────────┐ ┌▼──────────┐ │ Cell-ACME │ │ Cell-Nike │ │Cell-Shared-1│ │Cell-Shared-2│ │ (Dedicated)│ │ (Dedicated)│ │(100 SMBs) │ │(100 SMBs) │ │ │ │ │ │ │ │ │ │ ┌─────────┐ │ │┌─────────┐ │ │┌─────────┐ │ │┌─────────┐ │ │ │API GW │ │ ││API GW │ │ ││API GW │ │ ││API GW │ │ │ └────┬────┘ │ │└────┬────┘ │ │└────┬────┘ │ │└────┬────┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ┌────▼────┐ │ │┌────▼────┐ │ │┌────▼────┐ │ │┌────▼────┐ │ │ │Kafka │ │ ││Kafka │ │ ││Kafka │ │ ││Kafka │ │ │ │Workers │ │ ││Workers │ │ ││Workers │ │ ││Workers │ │ │ │DynamoDB │ │ ││DynamoDB │ │ ││DynamoDB │ │ ││DynamoDB │ │ │ │VPC │ │ ││VPC │ │ ││VPC │ │ ││VPC │ │ │ └─────────┘ │ │└─────────┘ │ │└─────────┘ │ │└─────────┘ │ └─────────────┘ └────────────┘ └─────────────┘ └─────────────┘ Blast radius: If Cell-ACME fails, only ACME affected

Design Principles

1. Cell Definition

A cell is:

  • Fully isolated infrastructure stack (compute, storage, network)
  • Separate VPC or namespace
  • Independent deployment unit
  • Handles subset of total traffic
  • Fails independently without cascading

2. Cell Sizing Strategy

Cell Type Tenants Capacity Use Case
Dedicated Large 1 (enterprise) 100K msgs/sec Top 10 customers
Dedicated Medium 1 (enterprise) 10K msgs/sec Top 100 customers
Shared Large 100-500 SMBs 50K msgs/sec Paid customers
Shared Small 1000+ free tier 10K msgs/sec Free/trial users

3. Routing Layer

  • Mapping Service: Maps account_id → cell_id
  • Storage: DynamoDB Global Table (highly available)
  • Caching: Cached in API gateway for low latency
  • Updates: Account growth triggers cell migration

4. Within Shared Cells: Additional Isolation

Even within shared cells, prevent noisy neighbors:

  • Rate limiting: Per account
  • Thread pool bulkheads: Per customer or per priority class
  • CPU/Memory limits: Kubernetes resource quotas per customer namespace
  • Connection pool limits: Per account to prevent DB exhaustion

5. Cell Migration

When customer outgrows shared cell:

  1. Provision new dedicated cell
  2. Dual-write messages to both old cell and new cell (shadowing)
  3. Validate new cell working correctly
  4. Update routing: new messages go to new cell only
  5. Backfill historical data async if needed
  6. Decommission old cell's resources for this customer

6. Operational Benefits

  • Gradual rollout: Deploy change to Cell-Canary (synthetic traffic), then 1 shared cell, then all
  • Feature flags per cell: Test new features on specific cells
  • Blast radius: Bug in new deployment only affects one cell
  • Capacity planning: Add cells when overall capacity reaches threshold
Key Trade-offs

Resource Efficiency vs Isolation

  • Shared cells: 70-80% resource utilization, but noisy neighbor risk
  • Dedicated cells: 40-60% utilization, but perfect isolation
  • Decision: Use dedicated for top customers (who pay for it), shared for long tail

Operational Complexity

  • Cost: More cells = more infrastructure to manage
  • Mitigation: Heavy automation, infrastructure-as-code, cell templates

Data Locality

  • Pro: Each cell has full data for its tenants - no cross-cell queries
  • Con: Global analytics requires aggregating across cells
  • Solution: Event streaming to central data warehouse
What to Emphasize (This is your expertise!)
  • "Fault isolation is non-negotiable" - At PayPal, we prioritize blast radius reduction over resource efficiency
  • "Cells are a socio-technical pattern" - Not just infrastructure, also ownership boundaries. Each cell can have a dedicated team.
  • "Start with fewer, larger cells" - Don't over-optimize for isolation on day 1. Add more cells as you scale.
  • "Automated cell provisioning" - If spinning up a new cell takes manual work, you won't do it. Make it push-button.
  • "Cells enable organizational scaling" - Conway's Law - cell architecture allows teams to own end-to-end infrastructure

Scenario 4: Identity & Authentication Service

Medium
Design Twilio's identity service that handles authentication for developers, service-to-service auth, and customer (end-user) verification.
Requirements
  • Developer authentication (console login, API keys)
  • Service-to-service authentication (inter-cell communication)
  • Account hierarchy (parent accounts, sub-accounts)
  • Integrate with verification (2FA, phone verification)
  • Support for SSO (SAML, OAuth) for enterprise
Constraints
  • 100K API auth checks per second
  • Sub-10ms latency for API key validation
  • 99.99% availability (auth failure = total outage)
  • Audit trail for compliance

Identity Architecture

┌───────────────────────────────────────────────────────┐ │ IDENTITY & AUTH SERVICE │ │ │ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │ │ Developer │ │ Service │ │End-User │ │ │ │ Auth │ │ Auth │ │ Verify │ │ │ │ │ │ │ │ │ │ │ │ - Console │ │ - mTLS │ │ - 2FA │ │ │ │ - API Keys │ │ - JWTs │ │ - Phone │ │ │ │ - SSO/SAML │ │ - Service │ │ Verify │ │ │ │ │ │ Accounts │ │ │ │ │ └──────┬───────┘ └───────┬───────┘ └─────┬─────┘ │ │ │ │ │ │ │ └──────────┬───────┴────────────────┘ │ │ │ │ │ ┌─────▼──────┐ │ │ │ Auth Core │ │ │ │ │ │ │ │ - Account │ │ │ │ Store │ │ │ │ - Token │ │ │ │ Service │ │ │ │ - Audit │ │ │ └────────────┘ │ └───────────────────────────────────────────────────────┘

1. Developer Authentication

API Key Validation (Hot Path)
  • Storage: DynamoDB
    • PK: api_key_sid
    • Attributes: account_id, permissions, created_at, last_used
  • Caching: Redis/ElastiCache
    • Cache API key → account mapping for 5 minutes
    • 99% cache hit rate → sub-1ms auth check
    • Cache miss → query DynamoDB (5-10ms)
  • Key rotation: Support multiple active keys, graceful deprecation
Console Login (OAuth2 / OIDC)
  • Use Auth0, Cognito, or custom OAuth2 provider
  • Support username/password + MFA
  • Issue JWT tokens for session management
  • Short-lived access tokens (15 min), long-lived refresh tokens
Enterprise SSO
  • SAML 2.0 integration for enterprise customers
  • Customer configures their IdP (Okta, Azure AD)
  • Twilio acts as service provider (SP)
  • JIT (Just-In-Time) provisioning of accounts

2. Service-to-Service Authentication

Why it matters

When API Gateway calls Billing Service, how do we authenticate?

Approach: Service Accounts + JWTs
  • Each service has a service account with credentials
  • Services request short-lived JWT tokens from auth service
  • JWT includes: service_id, permissions, expiry
  • Target service validates JWT (using public key or shared secret)
  • Token expires in 1 hour, auto-renewed
Alternative: mTLS (Mutual TLS)
  • Each service has X.509 certificate
  • TLS handshake validates both client and server
  • More secure but more operational overhead
  • Good for sensitive services (billing, identity)

3. Account Hierarchy

Twilio supports parent accounts with sub-accounts:

Account Hierarchy ┌─────────────────────────┐ │ Parent Account │ │ (Enterprise Corp) │ │ │ │ ┌───────────────────┐ │ │ │ Sub-Account │ │ │ │ (APAC Division) │ │ │ └───────────────────┘ │ │ │ │ ┌───────────────────┐ │ │ │ Sub-Account │ │ │ │ (EU Division) │ │ │ └───────────────────┘ │ └─────────────────────────┘ - Each sub-account has own API keys - Parent can view all sub-account usage - Billing rolls up to parent
Data Model
  • Accounts table:
    • account_sid (PK), parent_account_sid, name, status
  • API Keys table:
    • api_key_sid (PK), account_sid, permissions_scope
  • Query pattern: Given API key, lookup account, check if account is active, check permissions

4. Audit Logging

  • Every auth event logged: API key used, token issued, login attempt
  • Log to Kafka → S3 for compliance
  • ElasticSearch for searchable audit trail
  • Include: timestamp, account_id, action, ip_address, result

5. Rate Limiting & Abuse Prevention

  • Rate limit failed login attempts per IP: 10 per minute
  • Rate limit API key validation per key: 10K per second
  • CAPTCHA after 3 failed logins
  • Temporary account lockout after 10 failed attempts
Key Design Decisions

API Key Storage: DynamoDB vs RDS

  • Choice: DynamoDB
  • Reason: Simple key-value lookup, auto-scaling, low latency
  • Alternative: RDS would work but requires more capacity planning

Caching: How aggressive?

  • Choice: 5-minute cache TTL
  • Trade-off: API key revocation takes up to 5 min to propagate
  • Mitigation: Force cache invalidation on explicit revocation

Service Auth: JWT vs mTLS

  • Choice: JWT for most services, mTLS for sensitive
  • JWT easier operationally (no cert management)
  • mTLS for billing, identity (defense in depth)
Interview Talking Points
  • "Auth is critical path - optimize for p99 latency" - Every API call goes through auth
  • "Defense in depth" - Multiple layers: API key + rate limiting + network isolation
  • "Graceful degradation" - If Redis cache is down, fall back to DynamoDB (slower but works)
  • "Audit everything" - Auth events are critical for security and compliance

Scenario 5: Rate Limiting at Scale

Medium
Design a distributed rate limiting system that can enforce per-account quotas across multiple API gateway instances globally.
Requirements
  • Enforce rate limits per account (e.g., 1000 messages/minute)
  • Work across multiple API gateway instances (no single point)
  • Support different tiers (free, pro, enterprise)
  • Real-time enforcement (not best-effort)
  • Provide API clients with rate limit status in response headers
Constraints
  • 10K auth checks per second per gateway instance
  • Sub-5ms overhead for rate limit check
  • Minimize false positives (incorrectly blocking)

Click "Show Solution" to reveal the approach. Try designing it yourself first!

Recommended Approach: Redis with Sliding Window

Architecture

┌──────────┐ ┌──────────┐ ┌──────────┐ │API GW 1 │ │API GW 2 │ │API GW 3 │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │ └─────────────────┼─────────────────┘ │ ┌──────▼──────┐ │ Redis │ │ Cluster │ │ (Shared) │ └─────────────┘ Each gateway instance checks Redis before allowing request

Redis Data Structure: Sorted Set

For each account, store timestamps of recent requests:

  • Key: rate_limit:{account_id}:{window}
  • Value: Sorted set with score = timestamp
  • Example: rate_limit:AC123:minute → {1700000001, 1700000002, ...}

Algorithm: Sliding Window Counter

function checkRateLimit(account_id, limit, window_seconds):
    current_time = now()
    window_start = current_time - window_seconds

    # 1. Remove entries older than window
    ZREMRANGEBYSCORE rate_limit:{account_id} -inf window_start

    # 2. Count remaining entries
    count = ZCARD rate_limit:{account_id}

    # 3. Check if under limit
    if count < limit:
        # 4. Add current request
        ZADD rate_limit:{account_id} current_time current_time

        # 5. Set TTL to auto-expire old data
        EXPIRE rate_limit:{account_id} window_seconds

        return ALLOWED, remaining = (limit - count - 1)
    else:
        return DENIED, remaining = 0

Making it Atomic: Lua Script

Run all Redis commands in a single Lua script for atomicity:

local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local current_time = tonumber(ARGV[3])
local window_start = current_time - window

redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
local count = redis.call('ZCARD', key)

if count < limit then
    redis.call('ZADD', key, current_time, current_time)
    redis.call('EXPIRE', key, window)
    return {1, limit - count - 1}  -- allowed, remaining
else
    return {0, 0}  -- denied, remaining
end

Response Headers

Include in every API response:

  • X-RateLimit-Limit: 1000 - Total limit
  • X-RateLimit-Remaining: 742 - Requests remaining
  • X-RateLimit-Reset: 1700000060 - When window resets

Handling Multiple Time Windows

Support multiple limits simultaneously:

  • 1000 requests per minute
  • 10,000 requests per hour
  • 100,000 requests per day

Solution: Check each limit independently, deny if ANY limit exceeded

Scaling Redis

  • Redis Cluster: Partition by account_id (consistent hashing)
  • Replication: Redis replica for read availability
  • Fallback: If Redis is down, fail open or use local in-memory limit (less accurate)

Alternative Approaches

Approach Pros Cons
Fixed Window Counter Simple, low memory Burst at window edges (2x limit possible)
Token Bucket Allows bursts, smooth More complex state
Sliding Window (chosen) Accurate, no edge bursts Higher memory (stores timestamps)
Leaky Bucket Enforces strict rate No bursting, complex
Trade-offs Discussion

Accuracy vs Performance

  • Sliding window is most accurate but requires sorted set (more memory)
  • Fixed window uses single counter (less memory) but allows edge bursts
  • Decision: Accuracy matters for billing - use sliding window

Centralized (Redis) vs Distributed (local)

  • Redis: Accurate but adds dependency and latency
  • Local: Fast but inaccurate in distributed system (each instance tracks separately)
  • Decision: Use Redis with aggressive caching

Fail Open vs Fail Closed

  • If Redis is down:
    • Fail open: Allow all requests (better availability, risk of abuse)
    • Fail closed: Deny all requests (worse availability, safe)
  • Decision: Fail open with local rate limiting as backup
Interview Talking Points
  • "Rate limiting is about fairness and protection" - Prevent one customer from consuming all capacity
  • "Choose algorithm based on requirements" - If bursting is okay, use token bucket. If strict limit needed, use leaky bucket or sliding window
  • "Graceful degradation" - If centralized rate limiter fails, fall back to local limits
  • "Observability" - Track rate limit denials as a metric - might indicate customer needs to upgrade tier