System Design Scenarios - Common Interview Questions
Interview Approach: For each scenario, follow this framework:
1. Requirements: Clarify functional & non-functional requirements
2. Estimations: Calculate scale (users, storage, bandwidth)
3. API Design: Define key endpoints
4. Data Model: Schema design
5. High-Level Design: Components & data flow
6. Deep Dives: Address bottlenecks, trade-offs
7. Wrap-up: Monitoring, scaling, edge cases
1. Proximity Service (Uber, Yelp, "Find Nearby")
Problem Statement
Design a service to find nearby drivers/restaurants/places within a given radius from a user's location.
Functional Requirements:
- Find all entities within X miles/km of user location
- Update entity locations in real-time (for moving entities like drivers)
- Support adding/removing entities
Non-Functional Requirements:
- Low latency (< 100ms for queries)
- High availability
- Eventual consistency acceptable for location updates
- 100M users, 1M drivers/restaurants
Scale Estimations
Assumptions:
- 100M users (daily active)
- 1M drivers/restaurants
- Each user searches 5 times/day
- Driver location updates every 4 seconds
Calculations:
QPS (Queries):
100M users × 5 searches/day / 86400 seconds ≈ 5,800 QPS
Peak: 5,800 × 3 = 17,400 QPS
Location Updates:
1M drivers × (3600/4) updates/hour = 900M updates/hour
= 250,000 updates/second
Storage:
1M entities × 100 bytes (id, lat, lng, metadata) = 100 MB (negligible)
Approach 1: Geohashing
Geohash: Encode lat/lng into short string
Earth divided into grid:
┌────────┬────────┬────────┬────────┐
│ 9d │ 9e │ 9s │ 9t │
├────────┼────────┼────────┼────────┤
│ 9c │ 9f │ 9u │ 9v │
├────────┼────────┼────────┼────────┤
│ 96 │ 9g │ 9y │ 9z │
├────────┼────────┼────────┼────────┤
│ 93 │ 9b │ 9n │ 9p │
└────────┴────────┴────────┴────────┘
Each grid recursively subdivided:
9f → [9f0, 9f1, 9f2, ..., 9fz]
Geohash precision:
- 4 chars: ~20km × 20km
- 5 chars: ~5km × 5km
- 6 chars: ~1.2km × 1.2km
- 7 chars: ~150m × 150m
// Geohash implementation
class Geohash {
private static final String BASE32 = "0123456789bcdefghjkmnpqrstuvwxyz";
public static String encode(double lat, double lng, int precision) {
double[] latRange = {-90, 90};
double[] lngRange = {-180, 180};
StringBuilder geohash = new StringBuilder();
int bits = 0;
int bit = 0;
boolean isEven = true;
while (geohash.length() < precision) {
if (isEven) {
// Longitude
double mid = (lngRange[0] + lngRange[1]) / 2;
if (lng > mid) {
bit |= (1 << (4 - bits));
lngRange[0] = mid;
} else {
lngRange[1] = mid;
}
} else {
// Latitude
double mid = (latRange[0] + latRange[1]) / 2;
if (lat > mid) {
bit |= (1 << (4 - bits));
latRange[0] = mid;
} else {
latRange[1] = mid;
}
}
isEven = !isEven;
bits++;
if (bits == 5) {
geohash.append(BASE32.charAt(bit));
bits = 0;
bit = 0;
}
}
return geohash.toString();
}
// Find neighbors for edge cases
public static List getNeighbors(String geohash) {
// Return 8 neighboring geohashes (N, NE, E, SE, S, SW, W, NW)
// Implementation: adjust last character based on direction
}
}
// Usage
String userHash = Geohash.encode(37.7749, -122.4194, 6); // "9q8yy9"
// Query: Find all drivers with same prefix
SELECT * FROM drivers WHERE geohash LIKE '9q8yy9%';
// Edge case: User near boundary needs to check neighbors
List prefixes = Arrays.asList(userHash);
prefixes.addAll(Geohash.getNeighbors(userHash));
// Query multiple geohashes
SELECT * FROM drivers WHERE geohash LIKE '9q8yy9%'
OR geohash LIKE '9q8yyd%'
OR geohash LIKE '9q8yyc%'
...
Approach 2: Quadtree
Quadtree: Recursively divide 2D space into 4 quadrants
Level 0: World
┌─────────┐
│ │
│ • │
│ │
└─────────┘
Level 1: Split into 4
┌────┬────┐
│ NW │ NE │
├────┼────┤
│ SW │ SE │
└────┴────┘
Level 2: Split dense quadrants
┌──┬──┬────┐
│ │ │ NE │
├──┼──┤ │
│ │ ├────┤
├──┴──┤ SE │
│ SW │ │
└─────┴────┘
Keep splitting until:
- Node has ≤ 100 entities
- Reached max depth (e.g., 12 levels)
class QuadTree {
static final int CAPACITY = 100; // Max entities per node
static final int MAX_DEPTH = 12;
class Node {
Rectangle bounds;
List entities;
Node[] children; // [NW, NE, SW, SE]
boolean isLeaf;
Node(Rectangle bounds) {
this.bounds = bounds;
this.entities = new ArrayList<>();
this.isLeaf = true;
}
void insert(Entity entity) {
if (!bounds.contains(entity.location)) return;
if (isLeaf) {
entities.add(entity);
if (entities.size() > CAPACITY && bounds.depth < MAX_DEPTH) {
split();
}
} else {
// Insert into appropriate child
for (Node child : children) {
child.insert(entity);
}
}
}
void split() {
isLeaf = false;
double midLat = (bounds.minLat + bounds.maxLat) / 2;
double midLng = (bounds.minLng + bounds.maxLng) / 2;
children = new Node[4];
children[0] = new Node(new Rectangle(midLat, bounds.maxLat,
bounds.minLng, midLng)); // NW
children[1] = new Node(new Rectangle(midLat, bounds.maxLat,
midLng, bounds.maxLng)); // NE
children[2] = new Node(new Rectangle(bounds.minLat, midLat,
bounds.minLng, midLng)); // SW
children[3] = new Node(new Rectangle(bounds.minLat, midLat,
midLng, bounds.maxLng)); // SE
// Redistribute entities
for (Entity entity : entities) {
for (Node child : children) {
child.insert(entity);
}
}
entities.clear();
}
List search(Point center, double radius) {
List result = new ArrayList<>();
Circle searchArea = new Circle(center, radius);
if (!bounds.intersects(searchArea)) return result;
if (isLeaf) {
for (Entity entity : entities) {
if (distance(center, entity.location) <= radius) {
result.add(entity);
}
}
} else {
for (Node child : children) {
result.addAll(child.search(center, radius));
}
}
return result;
}
}
}
// Usage
QuadTree tree = new QuadTree(new Rectangle(-90, 90, -180, 180));
// Insert drivers
tree.insert(new Entity(driverId, new Point(37.7749, -122.4194)));
// Search
List nearby = tree.search(new Point(37.7749, -122.4194), 5.0); // 5 km
Approach 3: Redis Geo
// Redis has built-in geospatial support using sorted sets
// Add locations
GEOADD drivers -122.4194 37.7749 driver1
GEOADD drivers -122.4084 37.7849 driver2
GEOADD drivers -122.3964 37.7949 driver3
// Find within radius
GEORADIUS drivers -122.4194 37.7749 5 km WITHDIST WITHCOORD ASC
// Output:
1) "driver1"
1) "0.0002"
2) 1) "-122.41940021514892578"
2) "37.77490009675423117"
2) "driver2"
1) "1.2345"
2) 1) "-122.40839958190917969"
2) "37.78490123456789012"
// Update location (idempotent)
GEOADD drivers -122.4000 37.7800 driver1 // Updates existing
// Remove
ZREM drivers driver1
High-Level Architecture
┌─────────┐
│ User │
│ App │
└────┬────┘
│ 1. GET /nearby?lat=37.77&lng=-122.41&radius=5km
↓
┌─────────────┐
│ API Gateway │
└──────┬──────┘
│
↓
┌──────────────────┐ ┌──────────────┐
│ Location Service │───────→│ Redis Geo │
│ (Stateless) │ │ (or Quadtree │
└──────────────────┘ │ in memory) │
└──────────────┘
Driver location updates:
┌─────────┐
│ Driver │
│ App │
└────┬────┘
│ 2. POST /location {lat, lng}
↓
┌─────────────┐
│ API Gateway │
└──────┬──────┘
│
↓
┌──────────────────┐ ┌──────────────┐
│ Location Update │───────→│ Kafka │
│ Service │ └──────┬───────┘
└──────────────────┘ │
↓
┌──────────────────┐
│ Location Indexer │
│ (Consumer) │
└────────┬─────────┘
│
↓
┌──────────────────┐
│ Redis Geo / │
│ QuadTree │
└──────────────────┘
Geohash vs Quadtree vs Redis Geo
| Feature |
Geohash (SQL) |
Quadtree (In-Memory) |
Redis Geo |
| Query Time |
O(log n) with index |
O(log n) |
O(log n) |
| Insert Time |
O(log n) |
O(log n) |
O(log n) |
| Memory |
Disk-based |
High (entire tree) |
In-memory (efficient) |
| Boundary Issues |
Yes (need neighbors) |
No (radius search) |
No (radius search) |
| Scalability |
Easy (database sharding) |
Hard (tree rebalancing) |
Medium (Redis cluster) |
| Best For |
Static data, millions of entities |
Dynamic, memory available |
Real-time, high throughput |
Recommendation: For Uber-like systems with frequent updates, use Redis Geo. For Yelp-like systems with mostly static data, use Geohash with PostgreSQL.
2. URL Shortener (bit.ly, TinyURL)
Problem Statement
Design a URL shortening service that creates short aliases for long URLs.
Functional Requirements:
- Generate unique short URL for long URL
- Redirect short URL to original long URL
- Custom short URLs (optional)
- Expiration (optional)
- Analytics: track clicks, location, etc.
Non-Functional Requirements:
- High availability (redirect always works)
- Low latency for redirects (< 10ms)
- 100:1 read:write ratio (more redirects than creations)
Scale Estimations
Assumptions:
- 100M URLs created per month
- 100:1 read:write ratio → 10B redirects/month
- URL stored for 10 years
- Average URL: 100 bytes
Calculations:
Write QPS:
100M / (30 days × 86400 sec) ≈ 40 writes/sec
Peak: 40 × 2 = 80 writes/sec
Read QPS:
10B / (30 × 86400) ≈ 4,000 reads/sec
Peak: 4,000 × 2 = 8,000 reads/sec
Storage (10 years):
100M × 12 months × 10 years × 100 bytes = 1.2 TB
Short URL Length:
Using base62 (a-z, A-Z, 0-9): 62 characters
6 characters: 62^6 = 56 billion combinations (enough!)
7 characters: 62^7 = 3.5 trillion (future-proof)
API Design
POST /api/v1/shorten
Request:
{
"long_url": "https://example.com/very/long/url/with/parameters?a=1&b=2",
"custom_alias": "mylink", // Optional
"expiration": "2025-12-31T23:59:59Z" // Optional
}
Response:
{
"short_url": "https://short.ly/abc123",
"long_url": "https://example.com/very/long/url/with/parameters?a=1&b=2",
"created_at": "2024-01-01T00:00:00Z",
"expiration": "2025-12-31T23:59:59Z"
}
---
GET /:short_code
Example: GET /abc123
Response: HTTP 302 Redirect
Location: https://example.com/very/long/url/with/parameters?a=1&b=2
---
GET /api/v1/analytics/:short_code
Response:
{
"short_code": "abc123",
"total_clicks": 15234,
"clicks_by_country": {...},
"clicks_over_time": [...]
}
Approach 1: Hash + Collision Resolution
// Use hash function (MD5, SHA256) + take first 6 characters
String shortenURL(String longURL) {
String hash = MD5(longURL);
String shortCode = hash.substring(0, 6); // base62 encoded
// Check collision
while (database.exists(shortCode)) {
// Append salt and rehash
longURL = longURL + randomSalt();
hash = MD5(longURL);
shortCode = hash.substring(0, 6);
}
database.save(shortCode, longURL);
return shortCode;
}
Pros:
- Simple implementation
- Same long URL always gets same short code (idempotent)
Cons:
- Collision handling adds latency
- Not guaranteed to be unique without extra work
Approach 2: Counter + Base62 Encoding (Recommended)
// Use auto-incrementing counter, encode in base62
class Base62 {
private static final String ALPHABET =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final int BASE = 62;
public static String encode(long num) {
StringBuilder sb = new StringBuilder();
while (num > 0) {
sb.append(ALPHABET.charAt((int)(num % BASE)));
num /= BASE;
}
return sb.reverse().toString();
}
public static long decode(String str) {
long num = 0;
for (char c : str.toCharArray()) {
num = num * BASE + ALPHABET.indexOf(c);
}
return num;
}
}
// Usage
long id = database.getNextId(); // Auto-increment: 12345
String shortCode = Base62.encode(id); // "3D7"
// Store mapping
database.save(shortCode, longURL, id);
// Redirect
String longURL = database.get(shortCode);
redirect(longURL);
Pros:
- Guaranteed unique (no collisions!)
- Predictable length
- Simple and fast
Cons:
- Potential security issue: sequential IDs are guessable
- Need distributed ID generation for multiple servers
Distributed ID Generation
// Option 1: Snowflake-like ID (Twitter Snowflake)
64-bit ID structure:
┌────────────────────────────────────────┬──────┬──────┬────────────┐
│ Timestamp (41 bits) │ DC │Worker│ Sequence │
│ Milliseconds since epoch │(5bit)│(5bit)│ (12 bits) │
└────────────────────────────────────────┴──────┴──────┴────────────┘
Allows:
- 2^41 milliseconds (69 years)
- 32 datacenters
- 32 workers per datacenter
- 4096 IDs per millisecond per worker
// Option 2: Redis INCR
INCR url_counter
// Returns: 12345 (atomic, distributed)
// Option 3: Database auto-increment with ranges
Server 1: handles IDs 1-1,000,000
Server 2: handles IDs 1,000,001-2,000,000
Data Model
// NoSQL (Cassandra/DynamoDB) - Recommended for read-heavy workload
urls table:
{
"short_code": "abc123", // Partition key
"long_url": "https://...",
"created_at": "2024-01-01",
"expiration": "2025-12-31",
"user_id": "user123",
"clicks": 0
}
// Index on user_id for "get my URLs"
CREATE INDEX ON urls (user_id);
// Analytics (separate table for write scalability)
analytics table:
{
"short_code": "abc123", // Partition key
"timestamp": "2024-01-01T12:00", // Sort key
"ip": "1.2.3.4",
"country": "US",
"user_agent": "..."
}
High-Level Architecture
┌──────────┐
│ Client │
└────┬─────┘
│
↓
┌────────────────┐
│ Load Balancer │
└────┬───────────┘
│
├─────────────┐
↓ ↓
┌─────────┐ ┌─────────┐
│ Web │ │ Web │
│ Server │ │ Server │
└────┬────┘ └────┬────┘
│ │
↓ ↓
┌──────────────────────┐ ┌──────────────┐
│ Application Cache │←─────│ Redis/Memcached
│ (Popular URLs) │ │ (LRU eviction)
└──────────────────────┘ └──────────────┘
│
↓
┌──────────────────────┐ ┌──────────────┐
│ Database │──────│ Replicas │
│ (Cassandra/Dynamo) │ │ (Read) │
└──────────────────────┘ └──────────────┘
│
↓
┌──────────────────────┐
│ Analytics Stream │
│ (Kafka) │
└──────────────────────┘
│
↓
┌──────────────────────┐
│ Analytics Service │
│ (Flink/Spark) │
└──────────────────────┘
Caching Strategy
// Cache hot URLs (80/20 rule: 20% URLs get 80% traffic)
// Write-aside cache pattern
String redirect(String shortCode) {
// 1. Check cache
String longURL = cache.get(shortCode);
if (longURL != null) {
return longURL; // Cache hit
}
// 2. Cache miss - query database
longURL = database.get(shortCode);
if (longURL == null) {
throw new NotFoundException();
}
// 3. Update cache
cache.set(shortCode, longURL, TTL=3600); // 1 hour TTL
return longURL;
}
// Cache size estimation:
// Top 20% of 1.2TB = 240GB (fits in memory with Redis cluster)
Advanced Features
// 1. Custom aliases
if (customAlias != null) {
if (database.exists(customAlias)) {
throw new ConflictException("Alias already taken");
}
shortCode = customAlias;
} else {
shortCode = generateShortCode();
}
// 2. Expiration
void cleanupExpiredURLs() {
// Scheduled job runs daily
database.query("DELETE FROM urls WHERE expiration < NOW()");
}
// Or use TTL in DynamoDB/Cassandra (automatic cleanup)
// 3. Rate limiting
@RateLimit(100, per="hour", by="user_id")
String createShortURL(String longURL, String userId) {
// Prevents abuse
}
// 4. Analytics (async)
void trackClick(String shortCode, HttpRequest request) {
AnalyticsEvent event = new AnalyticsEvent(
shortCode,
request.getIP(),
request.getUserAgent(),
request.getReferer(),
timestamp
);
kafka.send("analytics", event); // Async, non-blocking
}
Common Pitfalls:
- Not handling collisions in hash approach
- Using sequential IDs (security risk)
- Not caching (redirects are 100× more frequent than creation)
- Synchronous analytics (slows down redirects)
- Not planning for scale (1.2TB data!)
3. News Feed (Facebook, Twitter Timeline)
Problem Statement
Design a scalable news feed system where users can post updates and see posts from people they follow.
Functional Requirements:
- Users can publish posts (text, images, videos)
- Users can follow other users
- Users see a feed of posts from people they follow (chronological or ranked)
- Feed should load quickly and paginate
Non-Functional Requirements:
- Low latency for feed generation (< 200ms)
- High availability
- Eventual consistency acceptable
- 1B users, 200M DAU
- Average 50 follows per user
- Average 5 posts per day per user
Scale Estimations
Assumptions:
- 200M DAU
- Each user follows 50 people
- Each user posts 5 times/day
- Feed fetched 10 times/day per user
Posts:
200M users × 5 posts/day = 1B posts/day
= 11,500 posts/second
Feed Fetches:
200M users × 10 fetches/day = 2B fetches/day
= 23,000 fetches/second
Peak: 70,000 fetches/second
Storage (1 year):
1B posts/day × 365 days × 1KB per post = 365 TB
Fan-out Approaches
Approach 1: Fan-out on Write (Push Model)
User A posts:
┌────────┐
│ User A │ ─── posts ──→ [Post] "Hello!"
└────────┘
│
↓
┌────────────────────────┐
│ Fan-out Service │
│ │
│ 1. Find followers │───→ SELECT * FROM followers WHERE followee = A
│ 2. Write to each │
│ follower's feed │
└────────────────────────┘
│
├──────┬──────┬──────┬──────┐
↓ ↓ ↓ ↓ ↓
[Feed B] [Feed C] [Feed D] ... [Feed Z]
Reading is FAST:
User B opens app → SELECT * FROM feed_B ORDER BY time LIMIT 20
// Fan-out on Write Implementation
void publishPost(User author, Post post) {
// 1. Save post
database.save(post);
// 2. Get followers
List
followers = getFollowers(author.id);
// 3. Write to each follower's feed (in parallel)
ExecutorService executor = Executors.newFixedThreadPool(100);
for (User follower : followers) {
executor.submit(() -> {
// Write to feed table/cache
feedCache.addToFeed(follower.id, post.id);
});
}
executor.shutdown();
}
// Feed retrieval (VERY FAST - O(1) lookup)
List getFeed(long userId, int limit) {
// Feed already pre-computed!
List postIds = feedCache.getFeed(userId, limit);
return database.getPostsByIds(postIds);
}
Pros:
✅ Reading is VERY fast (pre-computed)
✅ Good for read-heavy workloads
Cons:
❌ Posting is slow (celebrity with 100M followers = 100M writes!)
❌ Wastes resources for inactive users
❌ High fan-out for celebrities
Approach 2: Fan-out on Read (Pull Model)
User A posts:
┌────────┐
│ User A │ ─── posts ──→ [Post] "Hello!" (stored)
└────────┘
User B reads feed:
┌────────┐
│ User B │ ─── get feed ──→ [Merge Service]
└────────┘ │
↓
1. Find who B follows: [A, C, D]
2. Get recent posts from each
3. Merge and sort by time
4. Return top 20
Reading:
SELECT * FROM posts WHERE author_id IN (followees) ORDER BY time LIMIT 20
// Fan-out on Read Implementation
void publishPost(User author, Post post) {
// Just save the post
database.save(post); // FAST!
}
// Feed retrieval (SLOWER - need to query and merge)
List
getFeed(long userId, int limit) {
// 1. Get followees
List followees = getFollowees(userId); // [1, 5, 23, 42, ...]
// 2. Get recent posts from each followee
List posts = new ArrayList<>();
for (Long followeeId : followees) {
List userPosts = getRecentPosts(followeeId, limit);
posts.addAll(userPosts);
}
// 3. Merge and sort
posts.sort((a, b) -> b.timestamp.compareTo(a.timestamp));
// 4. Return top N
return posts.subList(0, Math.min(limit, posts.size()));
}
Pros:
✅ Posting is VERY fast
✅ No wasted work for inactive users
✅ Works well for celebrities
Cons:
❌ Reading is slow (query + merge)
❌ High load on database for popular users
Approach 3: Hybrid (Best)
// Hybrid: Fan-out for normal users, Pull for celebrities
void publishPost(User author, Post post) {
database.save(post);
if (author.followerCount < 10000) {
// Normal user: Fan-out on write
fanOutOnWrite(author, post);
} else {
// Celebrity: Skip fan-out (too expensive)
// Followers will pull when needed
}
}
List
getFeed(long userId, int limit) {
// 1. Get pre-computed feed (from fan-out)
List feed = feedCache.getFeed(userId, limit);
// 2. Get celebrity followees
List celebs = getCelebrityFollowees(userId);
// 3. Pull recent posts from celebrities
for (Long celeb : celebs) {
List celebrityPosts = getRecentPosts(celeb, 10);
feed.addAll(celebrityPosts);
}
// 4. Merge and sort
feed.sort((a, b) -> b.timestamp.compareTo(a.timestamp));
return feed.subList(0, limit);
}
Pros:
✅ Best of both worlds
✅ Scalable for all user types
Cons:
⚠️ More complex
Data Model
// Posts table (Cassandra/DynamoDB)
posts {
post_id: UUID, // Primary key
author_id: long,
content: text,
media_urls: list,
timestamp: timestamp,
likes: int,
comments: int
}
// Follows table (graph database or SQL)
follows {
follower_id: long, // Partition key
followee_id: long, // Sort key
timestamp: timestamp
}
CREATE INDEX ON follows (followee_id); // For getting followers
// Feed cache (Redis)
// Key: user:{user_id}:feed
// Value: Sorted Set (score = timestamp)
ZADD user:123:feed 1704067200 post_456
ZADD user:123:feed 1704067100 post_789
// Get feed
ZREVRANGE user:123:feed 0 19 WITHSCORES // Top 20, newest first
High-Level Architecture
┌──────────────────────────────────────────────────┐
│ Client Apps │
└────────────┬─────────────────────────────────────┘
│
↓
┌────────────────────────────────────────────────┐
│ API Gateway / Load Balancer │
└──────┬─────────────────────────────────────────┘
│
├──────────────┬──────────────┐
↓ ↓ ↓
┌─────────────┐ ┌──────────┐ ┌──────────────┐
│ Post │ │ Feed │ │ Follow │
│ Service │ │ Service │ │ Service │
└──────┬──────┘ └────┬─────┘ └──────┬───────┘
│ │ │
↓ ↓ ↓
┌──────────────────────────────────────────────┐
│ Message Queue (Kafka) │
└──────┬───────────────────────────────────────┘
│
↓
┌──────────────────┐ ┌────────────────────┐
│ Fan-out │─────→│ Feed Cache │
│ Service │ │ (Redis) │
│ (Workers) │ └────────────────────┘
└──────────────────┘
│
↓
┌──────────────────┐ ┌────────────────────┐
│ Posts DB │ │ Follows DB │
│ (Cassandra) │ │ (MySQL/Postgres) │
└──────────────────┘ └────────────────────┘
Ranking & Personalization
// Simple ranking: recency + engagement
double calculateScore(Post post, User viewer) {
double recencyScore = 1.0 / (hoursSince(post.timestamp) + 1);
double engagementScore = post.likes * 0.1 + post.comments * 0.5;
double connectionScore = isCloseConnection(viewer, post.author) ? 2.0 : 1.0;
return recencyScore * engagementScore * connectionScore;
}
// Machine learning approach:
// Features: recency, engagement, author popularity, viewer-author interaction history
// Model: Logistic regression or deep learning
// Predicted: P(user will engage with post)
List
getRankedFeed(long userId, int limit) {
List candidates = getCandidatePosts(userId, limit * 10); // Get more candidates
// Score and rank
candidates.sort((a, b) -> {
double scoreA = calculateScore(a, viewer);
double scoreB = calculateScore(b, viewer);
return Double.compare(scoreB, scoreA);
});
return candidates.subList(0, limit);
}
Optimization Tips:
- Cache everything: Feed cache (Redis), post cache, user profile cache
- Async processing: Fan-out workers process in background
- Pagination: Cursor-based (better than offset) for infinite scroll
- CDN: Serve media (images/videos) from CDN
- Sharding: Shard by user ID for feeds, by post ID for posts
4. Chat System (WhatsApp, Slack)
Problem Statement
Design a real-time messaging system supporting 1-on-1 and group chat with message history and delivery guarantees.
Functional Requirements:
- Send/receive messages in real-time (1-on-1 and group)
- Message history and search
- Online/offline status (presence)
- Read receipts and typing indicators
- Media sharing (images, files)
- Push notifications for offline users
Non-Functional Requirements:
- Low latency (< 100ms message delivery)
- High availability
- Message delivery guarantee (at least once)
- 1B users, 100M concurrent connections
Scale Estimations
Assumptions:
- 1B users, 100M DAU
- 50 messages/day per user
- Average message: 100 bytes
- 20% messages include media (avg 1MB)
Messages:
100M users × 50 messages/day = 5B messages/day
= 58,000 messages/second
Peak: 150,000 messages/second
Storage (1 year):
Text: 5B × 365 days × 100 bytes = 182 TB
Media: 5B × 365 × 0.2 × 1MB = 365 PB (use CDN)
Concurrent Connections:
100M users online simultaneously
Each WebSocket connection ~64KB memory
= 6.4 TB memory for connections alone!
WebSocket vs HTTP
HTTP (Request-Response):
Client Server
│────── GET /messages ──→│
│←────── 200 OK ──────────│
│ │
│──── POST /send ────────→│
│←────── 200 OK ──────────│
│ │
│ (Poll every 5 seconds) │
│────── GET /messages ──→│
│←────── 200 OK ──────────│
❌ Inefficient: Constant polling, high latency
WebSocket (Bi-directional):
Client Server
│─── Upgrade: websocket ─→│
│←── 101 Switching ────────│
│ │
├═══════════════════════════┤ Persistent connection
│ │
│←──── {message} ──────────│ Server push!
│──── {message} ──────────→│
│←──── {message} ──────────│
✅ Real-time, low latency, efficient
High-Level Architecture
┌──────────┐
│ Client │
│ App │
└────┬─────┘
│ WebSocket
↓
┌────────────────┐
│ Load Balancer │ (Sticky sessions for WebSocket)
└────┬───────────┘
│
├────────┬────────┬────────┐
↓ ↓ ↓ ↓
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Chat │ │ Chat │ │ Chat │
│ Server │ │ Server │ │ Server │
│ #1 │ │ #2 │ │ #3 │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
└───────┬───┴───────────┘
↓
┌──────────────────┐
│ Message Queue │ (Kafka)
└────────┬─────────┘
│
┌────────┴─────────┬──────────────┐
↓ ↓ ↓
┌─────────┐ ┌──────────────┐ ┌──────────┐
│ Message │ │ Notification│ │ Push │
│ Store │ │ Service │ │ Service │
│(Cassandra) └──────────────┘ └──────────┘
└─────────┘
Message Flow (1-on-1 Chat)
// User A sends message to User B
// 1. Client sends message via WebSocket
{
"type": "send_message",
"to": "user_b",
"content": "Hello!",
"timestamp": 1704067200000
}
// 2. Chat Server receives message
void handleSendMessage(WebSocket conn, Message msg) {
// Generate message ID
String messageId = generateId();
msg.messageId = messageId;
msg.from = getUserId(conn);
msg.status = "sent";
// Save to database (async)
messageStore.save(msg);
// Check if recipient is online
String recipientServer = connectionRegistry.getServer(msg.to);
if (recipientServer != null) {
// Recipient online - send directly
if (recipientServer.equals(thisServer)) {
// Same server
sendToUser(msg.to, msg);
} else {
// Different server - use message queue
kafka.send("messages", msg);
}
} else {
// Recipient offline - queue for push notification
kafka.send("notifications", msg);
}
// Send ACK to sender
sendAck(conn, messageId);
}
// 3. Message delivered to recipient
void deliverMessage(String userId, Message msg) {
WebSocket conn = getConnection(userId);
if (conn != null && conn.isOpen()) {
conn.send(JSON.stringify(msg));
// Update status to "delivered"
msg.status = "delivered";
messageStore.updateStatus(msg.messageId, "delivered");
}
}
// 4. Read receipt
void handleReadReceipt(String userId, String messageId) {
messageStore.updateStatus(messageId, "read");
// Notify sender
Message original = messageStore.get(messageId);
sendToUser(original.from, {
type: "read_receipt",
messageId: messageId,
readBy: userId,
timestamp: now()
});
}
Group Chat
// Group chat: Message fan-out to all members
void handleGroupMessage(Message msg) {
// Save message
messageStore.save(msg);
// Get group members
List members = groupStore.getMembers(msg.groupId);
// Fan-out to all members (parallel)
ExecutorService executor = Executors.newFixedThreadPool(100);
for (String memberId : members) {
if (memberId.equals(msg.from)) continue; // Skip sender
executor.submit(() -> {
deliverMessage(memberId, msg);
});
}
executor.shutdown();
}
// Optimization: For large groups (>100 members), use Pub/Sub
void handleLargeGroupMessage(Message msg) {
// Publish to topic
redis.publish("group:" + msg.groupId, msg);
// Each chat server subscribes to group topics
// and delivers to local connections
}
Message Persistence
// Cassandra schema for message storage
CREATE TABLE messages (
user_id text, -- Partition key (who the message is for)
conversation_id text, -- Clustering key (1-on-1 or group ID)
message_id timeuuid, -- Clustering key (time-based UUID)
from_user text,
content text,
media_url text,
status text, -- sent, delivered, read
timestamp timestamp,
PRIMARY KEY ((user_id, conversation_id), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
// Query: Get recent messages in conversation
SELECT * FROM messages
WHERE user_id = 'user_a'
AND conversation_id = 'user_b'
ORDER BY message_id DESC
LIMIT 50;
// Partitioning strategy: Shard by user_id
// Each user's messages stored together for fast retrieval
Presence System (Online/Offline Status)
// Heartbeat approach
class PresenceService {
// Redis: user_id -> last_seen_timestamp
private RedisClient redis;
void onConnect(String userId) {
redis.set("presence:" + userId, System.currentTimeMillis());
redis.expire("presence:" + userId, 60); // 1 minute TTL
// Broadcast to friends
broadcastStatus(userId, "online");
}
void onHeartbeat(String userId) {
// Client sends heartbeat every 30 seconds
redis.set("presence:" + userId, System.currentTimeMillis());
redis.expire("presence:" + userId, 60);
}
void onDisconnect(String userId) {
redis.del("presence:" + userId);
broadcastStatus(userId, "offline");
}
String getStatus(String userId) {
Long lastSeen = redis.get("presence:" + userId);
if (lastSeen == null) return "offline";
long age = System.currentTimeMillis() - lastSeen;
if (age < 60000) return "online";
if (age < 300000) return "away"; // < 5 minutes
return "offline";
}
}
Message Delivery Guarantees
// At-least-once delivery using ACKs and retries
class MessageDelivery {
void sendMessage(Message msg) {
// 1. Save to pending queue
pendingQueue.add(msg.messageId, msg);
// 2. Send message
deliverMessage(msg);
// 3. Wait for ACK (with timeout)
CompletableFuture ack = waitForAck(msg.messageId);
ack.orTimeout(5, TimeUnit.SECONDS)
.exceptionally(ex -> {
// No ACK received - retry
retryMessage(msg);
return null;
});
}
void handleAck(String messageId) {
// Remove from pending queue
pendingQueue.remove(messageId);
}
void retryMessage(Message msg) {
int attempts = msg.retryCount++;
if (attempts < 3) {
// Exponential backoff
scheduler.schedule(() -> sendMessage(msg),
Math.pow(2, attempts), TimeUnit.SECONDS);
} else {
// Give up - message lost (log and alert)
logger.error("Failed to deliver message: " + msg.messageId);
}
}
}
// Idempotency: Recipient deduplicates by message_id
Set processedMessages = new HashSet<>();
void receiveMessage(Message msg) {
if (processedMessages.contains(msg.messageId)) {
// Duplicate - send ACK but don't process
sendAck(msg.messageId);
return;
}
processedMessages.add(msg.messageId);
displayMessage(msg);
sendAck(msg.messageId);
}
Typing Indicator
// Lightweight ephemeral events (don't persist)
void handleTyping(String userId, String conversationId) {
// Broadcast to conversation members
Message event = {
type: "typing",
from: userId,
conversationId: conversationId
};
// Don't save - just broadcast
List members = getConversationMembers(conversationId);
for (String member : members) {
if (!member.equals(userId)) {
sendToUser(member, event);
}
}
}
// Client: Throttle typing events (max once per 3 seconds)
let typingTimeout;
textbox.on('input', () => {
clearTimeout(typingTimeout);
socket.send({type: 'typing', conversationId: currentChat});
typingTimeout = setTimeout(() => {
socket.send({type: 'stop_typing', conversationId: currentChat});
}, 3000);
});
Scalability Challenges:
- 100M concurrent WebSocket connections: Shard across many servers (~10K connections per server = 10,000 servers!)
- Connection registry: Use Redis to track which server each user is connected to
- Message ordering: Use Lamport timestamps or vector clocks for distributed ordering
- Media files: Upload to S3, store URLs in messages, serve via CDN
5. Search Autocomplete (Google, Amazon)
Problem Statement
Design a system that suggests search queries as the user types.
Functional Requirements:
- Suggest top 10 queries matching user's prefix
- Rank suggestions by popularity/relevance
- Update suggestions in real-time as user types
- Support millions of queries
Non-Functional Requirements:
- Low latency (< 50ms for suggestions)
- High availability
- 10M DAU, each user types 10 queries/day
Approach: Trie (Prefix Tree)
Trie for queries: "cat", "car", "card", "care", "dog"
root
/ \
c d
| |
a o
/ \ |
r t g ($)
|
[$ car:1000]
|
d e
| |
[$card:500] [$care:300]
$ = end of query
Numbers = popularity score
class TrieNode {
Map children = new HashMap<>();
boolean isEndOfWord = false;
long frequency = 0; // How popular this query is
String query = null;
// Top suggestions cached at each node
List topSuggestions = new ArrayList<>();
}
class AutocompleteSystem {
TrieNode root = new TrieNode();
// Build trie from query logs
void addQuery(String query, long frequency) {
TrieNode node = root;
for (char c : query.toLowerCase().toCharArray()) {
node.children.putIfAbsent(c, new TrieNode());
node = node.children.get(c);
}
node.isEndOfWord = true;
node.query = query;
node.frequency = frequency;
// Update top suggestions for all prefix nodes
updateTopSuggestions(query, frequency);
}
// Get suggestions for prefix
List getSuggestions(String prefix) {
TrieNode node = root;
// Navigate to prefix
for (char c : prefix.toLowerCase().toCharArray()) {
if (!node.children.containsKey(c)) {
return Collections.emptyList(); // No matches
}
node = node.children.get(c);
}
// Return pre-computed top suggestions
return node.topSuggestions;
}
// Pre-compute top K suggestions at each node
void updateTopSuggestions(String query, long frequency) {
TrieNode node = root;
for (char c : query.toLowerCase().toCharArray()) {
node = node.children.get(c);
// Collect all queries in subtree
PriorityQueue pq = new PriorityQueue<>(
(a, b) -> Long.compare(b.frequency, a.frequency)
);
collectQueries(node, pq);
// Keep top 10
node.topSuggestions = pq.stream()
.limit(10)
.collect(Collectors.toList());
}
}
void collectQueries(TrieNode node, PriorityQueue pq) {
if (node.isEndOfWord) {
pq.offer(new Suggestion(node.query, node.frequency));
}
for (TrieNode child : node.children.values()) {
collectQueries(child, pq);
}
}
}
// Usage
AutocompleteSystem ac = new AutocompleteSystem();
// Add popular queries (from logs)
ac.addQuery("cat", 1000000);
ac.addQuery("car", 500000);
ac.addQuery("card", 200000);
ac.addQuery("care", 150000);
// User types "ca"
List suggestions = ac.getSuggestions("ca");
// Returns: ["cat", "car", "card", "care", ...]
Ranking Strategies
// 1. Simple frequency
score = frequency
// 2. Time-decayed frequency (recent queries ranked higher)
score = frequency * Math.exp(-decayRate * daysSince(query))
// 3. Personalized (user's search history)
score = globalFrequency * 0.7 + userFrequency * 0.3
// 4. Machine learning
features = [
frequency,
recency,
clickThroughRate,
userHistoryMatch,
queryLength,
trending
]
score = model.predict(features)
Data Collection Pipeline
User searches → Logs → Processing → Trie Update
┌──────────┐
│ User │ "cat videos"
│ searches │
└────┬─────┘
│
↓
┌──────────────┐
│ Logs │ (Kafka)
│ {query, ts} │
└──────┬───────┘
│
↓
┌──────────────┐
│ Aggregator │ (Spark/Flink)
│ │ Count frequency per query
│ Every hour: │
│ cat → 1M │
│ dog → 800K │
└──────┬───────┘
│
↓
┌──────────────┐
│ Trie │ Updated periodically
│ Builder │
└──────┬───────┘
│
↓
┌──────────────┐
│ Autocomplete │ Serves requests
│ Service │
└──────────────┘
Optimization: Cache Trie in Memory
// Trie stored in memory across multiple servers
class DistributedAutocomplete {
// Each server loads full trie into memory
private TrieNode trie;
// Hot cache for popular prefixes
private LoadingCache> cache;
public DistributedAutocomplete() {
// Load trie from storage on startup
this.trie = loadTrieFromDB();
// LRU cache for popular prefixes
this.cache = CacheBuilder.newBuilder()
.maximumSize(100_000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.build(new CacheLoader>() {
@Override
public List load(String prefix) {
return trie.getSuggestions(prefix);
}
});
}
List autocomplete(String prefix) {
return cache.get(prefix); // O(1) cache hit, O(k) cache miss
}
// Periodic trie refresh (every hour)
@Scheduled(fixedDelay = 3600000)
void refreshTrie() {
TrieNode newTrie = loadTrieFromDB();
this.trie = newTrie; // Atomic swap
this.cache.invalidateAll();
}
}
// Memory estimation:
// 10M unique queries × 50 bytes per query = 500 MB
// Trie structure overhead: ~2× = 1 GB total
// Easily fits in memory!
Client-Side Optimization
// Debounce to reduce API calls
let debounceTimer;
const DEBOUNCE_MS = 300;
searchBox.addEventListener('input', (e) => {
clearTimeout(debounceTimer);
const query = e.target.value;
// Only autocomplete for 2+ characters
if (query.length < 2) return;
debounceTimer = setTimeout(() => {
fetchSuggestions(query);
}, DEBOUNCE_MS);
});
async function fetchSuggestions(prefix) {
const response = await fetch(`/autocomplete?q=${prefix}`);
const suggestions = await response.json();
displaySuggestions(suggestions);
}
// Result: Reduces API calls by ~80%
// User typing "restaurant" makes 1 call instead of 10
6. Notification System
Problem Statement
Design a system to send notifications via multiple channels (push, email, SMS) with delivery guarantees.
Functional Requirements:
- Send notifications via push, email, SMS
- Support different notification types (transactional, marketing)
- User preferences (opt-in/opt-out per channel)
- Delivery tracking and retry logic
- Rate limiting per user
Non-Functional Requirements:
- High throughput (millions of notifications/day)
- Reliable delivery (at-least-once)
- Low latency for critical notifications
High-Level Architecture
┌─────────────┐
│ Service │ (e.g., Order Service)
│ Triggers │
└──────┬──────┘
│ 1. Publish event
↓
┌──────────────────┐
│ Event Bus │ (Kafka/SNS)
│ │
└────────┬─────────┘
│
↓
┌──────────────────┐ ┌──────────────────┐
│ Notification │──────→│ User Prefs │
│ Service │ │ Database │
└────────┬─────────┘ └──────────────────┘
│
├────────┬────────┬────────┐
↓ ↓ ↓ ↓
┌────────┐ ┌──────┐ ┌──────┐ ┌──────────┐
│ Push │ │Email │ │ SMS │ │ In-App │
│Worker │ │Worker│ │Worker│ │ Worker │
└────┬───┘ └───┬──┘ └───┬──┘ └────┬─────┘
│ │ │ │
↓ ↓ ↓ ↓
┌────────┐ ┌──────┐ ┌──────┐ ┌──────────┐
│ FCM/ │ │SMTP/ │ │Twilio│ │ WebSocket│
│ APNS │ │SES │ │SNS │ │ Server │
└────────┘ └──────┘ └──────┘ └──────────┘
Notification Service Implementation
class NotificationService {
void sendNotification(NotificationRequest request) {
// 1. Load user preferences
UserPreferences prefs = userPrefsRepo.get(request.userId);
// 2. Determine channels
List channels = determineChannels(request.type, prefs);
// 3. Create notification tasks
for (Channel channel : channels) {
NotificationTask task = new NotificationTask(
request.userId,
channel,
request.title,
request.body,
request.data,
request.priority
);
// 4. Enqueue for workers
if (request.priority == Priority.HIGH) {
// Critical notifications (OTP, alerts)
highPriorityQueue.send(task);
} else {
// Normal notifications (marketing, updates)
normalQueue.send(task);
}
}
// 5. Store notification history
notificationRepo.save(request);
}
List determineChannels(NotificationType type, UserPreferences prefs) {
List channels = new ArrayList<>();
if (type == NotificationType.TRANSACTIONAL) {
// Transactional: always send (regardless of preferences)
channels.add(Channel.PUSH);
channels.add(Channel.EMAIL);
} else if (type == NotificationType.MARKETING) {
// Marketing: respect user preferences
if (prefs.pushEnabled) channels.add(Channel.PUSH);
if (prefs.emailEnabled) channels.add(Channel.EMAIL);
}
return channels;
}
}
Push Notification Worker
class PushNotificationWorker {
void processTask(NotificationTask task) {
try {
// 1. Get device tokens
List tokens = deviceRepo.getTokens(task.userId);
if (tokens.isEmpty()) {
logger.warn("No device tokens for user: " + task.userId);
return;
}
// 2. Send to FCM/APNS
for (String token : tokens) {
PushMessage message = new PushMessage(
token,
task.title,
task.body,
task.data
);
PushResult result = fcmClient.send(message);
// 3. Handle result
if (result.isSuccess()) {
notificationRepo.markDelivered(task.id, token);
} else if (result.isInvalidToken()) {
// Token expired - remove from database
deviceRepo.removeToken(token);
} else {
// Temporary failure - retry
retryQueue.send(task);
}
}
} catch (Exception e) {
logger.error("Failed to send push: " + task.id, e);
retryQueue.send(task);
}
}
}
Email Worker with Template
class EmailWorker {
void processTask(NotificationTask task) {
try {
// 1. Load template
EmailTemplate template = templateRepo.get(task.templateId);
// 2. Render with data
String html = template.render(task.data);
// 3. Send via SES/SendGrid
EmailMessage email = EmailMessage.builder()
.to(task.userEmail)
.from("notifications@example.com")
.subject(task.title)
.html(html)
.build();
EmailResult result = sesClient.send(email);
// 4. Track delivery
if (result.isSuccess()) {
notificationRepo.markDelivered(task.id, task.userEmail);
} else {
retryQueue.send(task);
}
// 5. Rate limiting (prevent spam)
rateLimiter.increment(task.userId, "email");
} catch (Exception e) {
logger.error("Failed to send email: " + task.id, e);
retryQueue.send(task);
}
}
}
// Example template (Handlebars/Mustache)
String template = """
{{title}}
Hi {{userName}},
{{message}}
{{actionText}}
""";
Map
data = Map.of(
"title", "Order Confirmed",
"userName", "John",
"message", "Your order #12345 has been confirmed",
"actionUrl", "https://example.com/orders/12345",
"actionText", "View Order"
);
Retry Logic with Exponential Backoff
class RetryHandler {
private static final int MAX_RETRIES = 3;
private static final int[] BACKOFF_DELAYS = {60, 300, 900}; // seconds
void handleFailure(NotificationTask task) {
int attempt = task.retryCount++;
if (attempt >= MAX_RETRIES) {
// Give up - send to dead letter queue
deadLetterQueue.send(task);
alerting.sendAlert("Notification failed after " + MAX_RETRIES + " retries: " + task.id);
return;
}
// Schedule retry with exponential backoff
int delay = BACKOFF_DELAYS[attempt];
scheduler.schedule(() -> {
retryQueue.send(task);
}, delay, TimeUnit.SECONDS);
logger.info("Scheduled retry #{} for task {} in {} seconds",
attempt + 1, task.id, delay);
}
}
Rate Limiting
// Prevent notification spam
class NotificationRateLimiter {
// Redis-based rate limiter
private RedisClient redis;
boolean isAllowed(String userId, Channel channel) {
String key = "ratelimit:" + userId + ":" + channel;
// Token bucket algorithm
// Max 10 notifications per hour
long count = redis.incr(key);
if (count == 1) {
redis.expire(key, 3600); // 1 hour window
}
return count <= 10;
}
}
// Usage
if (!rateLimiter.isAllowed(task.userId, Channel.PUSH)) {
logger.warn("Rate limit exceeded for user: " + task.userId);
return; // Skip sending
}
User Preferences
// Database schema
CREATE TABLE notification_preferences (
user_id BIGINT PRIMARY KEY,
push_enabled BOOLEAN DEFAULT TRUE,
email_enabled BOOLEAN DEFAULT TRUE,
sms_enabled BOOLEAN DEFAULT FALSE,
marketing_push BOOLEAN DEFAULT TRUE,
marketing_email BOOLEAN DEFAULT TRUE,
quiet_hours_start TIME, -- e.g., "22:00"
quiet_hours_end TIME, -- e.g., "08:00"
timezone VARCHAR(50)
);
// Check quiet hours
boolean isInQuietHours(long userId) {
UserPreferences prefs = userPrefsRepo.get(userId);
ZonedDateTime now = ZonedDateTime.now(ZoneId.of(prefs.timezone));
LocalTime currentTime = now.toLocalTime();
return currentTime.isAfter(prefs.quietHoursStart) ||
currentTime.isBefore(prefs.quietHoursEnd);
}
// Don't send non-critical notifications during quiet hours
if (isInQuietHours(task.userId) && task.priority != Priority.HIGH) {
delayQueue.send(task, prefs.quietHoursEnd); // Send later
return;
}
7. Web Crawler (Google, Bing)
Problem Statement
Design a scalable web crawler that discovers and downloads web pages for indexing.
Functional Requirements:
- Crawl billions of web pages
- Extract links and content
- Respect robots.txt and politeness policy
- Handle duplicates
- Prioritize important pages
Non-Functional Requirements:
- Distributed and scalable
- Fault tolerant
- 1 billion pages/month
- Average 100KB per page
High-Level Architecture
┌─────────────┐
│ Seed │ Initial URLs
│ URLs │
└──────┬──────┘
│
↓
┌─────────────────────┐
│ URL Frontier │ Priority queue of URLs to crawl
│ (Kafka/Redis) │
└──────┬──────────────┘
│
├──────────┬──────────┬──────────┐
↓ ↓ ↓ ↓
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Crawler │ │Crawler │ │Crawler │ │Crawler │
│Worker │ │Worker │ │Worker │ │Worker │
└────┬───┘ └────┬───┘ └────┬───┘ └────┬───┘
│ │ │ │
↓ ↓ ↓ ↓
┌──────────────────────────────────────────┐
│ HTML Store (S3/HDFS) │
└──────────────────────────────────────────┘
│
↓
┌──────────────────────────────────────────┐
│ URL Extractor & Deduplicator │
└──────────────────────────────────────────┘
│
↓ (New URLs)
┌──────────────────────────────────────────┐
│ URL Frontier (loop) │
└──────────────────────────────────────────┘
URL Frontier (Priority Queue)
// Prioritize URLs by importance
class URLFrontier {
// Multiple priority queues
PriorityQueue[] queues; // [high, medium, low priority]
// Host-based politeness queues
Map> hostQueues;
URL getNextURL() {
// 1. Select priority level (80% high, 15% medium, 5% low)
int priority = selectPriority();
// 2. Select host (round-robin to distribute load)
String host = selectHost();
// 3. Get URL from host queue
Queue hostQueue = hostQueues.get(host);
if (hostQueue == null || hostQueue.isEmpty()) {
return null;
}
return hostQueue.poll();
}
void addURL(URL url, int priority) {
// Add to priority queue
queues[priority].offer(url);
// Add to host queue for politeness
String host = url.getHost();
hostQueues.computeIfAbsent(host, k -> new LinkedList<>())
.offer(url);
}
int calculatePriority(URL url) {
// Factors:
// - PageRank (from previous crawl)
// - Update frequency (news sites change often)
// - Depth from seed (prefer shallow URLs)
int priority = 0;
if (hasHighPageRank(url)) priority += 50;
if (isNewsSource(url)) priority += 30;
if (getDepth(url) < 3) priority += 20;
return priority;
}
}
Politeness Policy
// Respect website resources - don't overload servers
class PolitenessManager {
// Minimum delay between requests to same host
private static final long MIN_DELAY_MS = 1000; // 1 second
// Track last access time per host
private Map lastAccessTime = new ConcurrentHashMap<>();
boolean canCrawl(String host) {
Long lastAccess = lastAccessTime.get(host);
if (lastAccess == null) {
return true;
}
long timeSinceAccess = System.currentTimeMillis() - lastAccess;
return timeSinceAccess >= MIN_DELAY_MS;
}
void recordAccess(String host) {
lastAccessTime.put(host, System.currentTimeMillis());
}
}
// Usage in crawler
void crawl(URL url) {
String host = url.getHost();
// Wait if necessary
while (!politeness.canCrawl(host)) {
Thread.sleep(100);
}
// Fetch page
Page page = httpClient.get(url);
// Record access
politeness.recordAccess(host);
// Process page
processPage(page);
}
robots.txt Compliance
// Check robots.txt before crawling
class RobotsManager {
private Map cache = new ConcurrentHashMap<>();
boolean isAllowed(URL url) {
String host = url.getHost();
// Get or fetch robots.txt
RobotRules rules = cache.computeIfAbsent(host, h -> {
return fetchRobotsTxt(h);
});
// Check if URL is allowed
return rules.isAllowed(url.getPath());
}
RobotRules fetchRobotsTxt(String host) {
try {
String robotsUrl = "https://" + host + "/robots.txt";
String content = httpClient.get(robotsUrl).getContent();
return RobotRulesParser.parse(content);
} catch (NotFoundException e) {
// No robots.txt - allow all
return RobotRules.allowAll();
}
}
}
// Example robots.txt
"""
User-agent: *
Disallow: /admin/
Disallow: /private/
Crawl-delay: 10
User-agent: Googlebot
Allow: /admin/public/
Crawl-delay: 2
"""
Deduplication
// Avoid crawling same URL multiple times
class URLDeduplicator {
// Bloom filter for fast membership test
private BloomFilter bloomFilter;
// Persistent store for exact dedup
private Database urlStore;
boolean isDuplicate(URL url) {
String normalized = normalize(url);
// 1. Quick check with Bloom filter
if (!bloomFilter.mightContain(normalized)) {
return false; // Definitely not seen
}
// 2. Exact check in database
return urlStore.exists(normalized);
}
void markSeen(URL url) {
String normalized = normalize(url);
bloomFilter.add(normalized);
urlStore.save(normalized);
}
String normalize(URL url) {
// Remove fragments, sort query params, lowercase
return url.getProtocol() + "://" +
url.getHost().toLowerCase() +
url.getPath() +
normalizequeryParams(url.getQuery());
}
// Example normalization:
// http://example.com/page?b=2&a=1#section
// → http://example.com/page?a=1&b=2
}
Content Deduplication
// Detect duplicate content (mirrors, copies)
class ContentDeduplicator {
boolean isDuplicateContent(String content) {
// Calculate fingerprint (SimHash)
long fingerprint = simHash(content);
// Check if similar fingerprint exists
return fingerprintStore.existsSimilar(fingerprint, threshold=3);
}
long simHash(String content) {
// 1. Tokenize and hash
Map features = extractFeatures(content);
// 2. Create 64-bit fingerprint
int[] v = new int[64];
for (Map.Entry entry : features.entrySet()) {
long hash = hash64(entry.getKey());
int weight = entry.getValue();
for (int i = 0; i < 64; i++) {
if (((hash >> i) & 1) == 1) {
v[i] += weight;
} else {
v[i] -= weight;
}
}
}
// 3. Convert to binary
long fingerprint = 0;
for (int i = 0; i < 64; i++) {
if (v[i] > 0) {
fingerprint |= (1L << i);
}
}
return fingerprint;
}
// Hamming distance < 3 → similar content
int hammingDistance(long fp1, long fp2) {
return Long.bitCount(fp1 ^ fp2);
}
}
Crawler Worker
class CrawlerWorker {
void run() {
while (true) {
// 1. Get next URL
URL url = frontier.getNextURL();
if (url == null) {
Thread.sleep(1000);
continue;
}
try {
// 2. Check robots.txt
if (!robotsManager.isAllowed(url)) {
logger.info("Disallowed by robots.txt: " + url);
continue;
}
// 3. Check if already crawled
if (deduplicator.isDuplicate(url)) {
continue;
}
// 4. Fetch page
HttpResponse response = httpClient.get(url);
if (response.getStatusCode() != 200) {
continue;
}
// 5. Save HTML
storage.save(url, response.getContent());
// 6. Extract links
List links = linkExtractor.extract(response.getContent(), url);
// 7. Add to frontier
for (URL link : links) {
if (!deduplicator.isDuplicate(link)) {
frontier.addURL(link, calculatePriority(link));
}
}
// 8. Mark as crawled
deduplicator.markSeen(url);
} catch (Exception e) {
logger.error("Failed to crawl: " + url, e);
}
}
}
}
Scalability:
- Distributed workers: 1000s of workers crawling in parallel
- URL frontier: Shard by host for politeness + parallelism
- Storage: Distributed file system (HDFS) or S3
- Deduplication: Bloom filter (in-memory) + distributed hash table
8. Distributed Cache (Redis, Memcached)
Problem Statement
Design a distributed caching system for reducing database load and improving latency.
Functional Requirements:
- GET/SET/DELETE operations
- TTL (time-to-live) support
- LRU eviction when full
- High availability
Non-Functional Requirements:
- Low latency (< 1ms for cache hit)
- High throughput (millions of ops/sec)
- Scalable (100s of GB cached data)
Architecture
┌──────────┐
│ Client │
└────┬─────┘
│
↓
┌────────────────┐
│ Cache Client │ (Consistent hashing)
│ Library │
└────┬───────────┘
│
├─────────┬─────────┬─────────┐
↓ ↓ ↓ ↓
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Cache │ │ Cache │ │ Cache │ │ Cache │
│ Node 1 │ │ Node 2 │ │ Node 3 │ │ Node 4 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
(Shard 1) (Shard 2) (Shard 3) (Shard 4)
Each node:
- In-memory hash map
- LRU eviction
- Optional persistence
Consistent Hashing (Covered Earlier)
See "Proximity Service" section for consistent hashing details. Used here to distribute keys across cache nodes with minimal reshuffling when nodes are added/removed.
LRU Eviction
// LRU Cache with HashMap + Doubly Linked List
class LRUCache {
class Node {
K key;
V value;
Node prev, next;
long expiryTime;
Node(K key, V value, long ttl) {
this.key = key;
this.value = value;
this.expiryTime = (ttl > 0) ?
System.currentTimeMillis() + ttl : Long.MAX_VALUE;
}
boolean isExpired() {
return System.currentTimeMillis() > expiryTime;
}
}
private final int capacity;
private final Map cache = new HashMap<>();
private final Node head, tail;
public LRUCache(int capacity) {
this.capacity = capacity;
head = new Node(null, null, 0);
tail = new Node(null, null, 0);
head.next = tail;
tail.prev = head;
}
public V get(K key) {
Node node = cache.get(key);
if (node == null) {
return null; // Cache miss
}
if (node.isExpired()) {
// Remove expired entry
remove(key);
return null;
}
// Move to front (most recently used)
moveToFront(node);
return node.value;
}
public void put(K key, V value, long ttlMs) {
Node node = cache.get(key);
if (node != null) {
// Update existing
node.value = value;
node.expiryTime = System.currentTimeMillis() + ttlMs;
moveToFront(node);
} else {
// Add new
node = new Node(key, value, ttlMs);
cache.put(key, node);
addToFront(node);
// Evict if over capacity
if (cache.size() > capacity) {
Node lru = removeLRU();
cache.remove(lru.key);
}
}
}
public void remove(K key) {
Node node = cache.get(key);
if (node != null) {
removeNode(node);
cache.remove(key);
}
}
private void moveToFront(Node node) {
removeNode(node);
addToFront(node);
}
private void addToFront(Node node) {
node.next = head.next;
node.prev = head;
head.next.prev = node;
head.next = node;
}
private void removeNode(Node node) {
node.prev.next = node.next;
node.next.prev = node.prev;
}
private Node removeLRU() {
Node lru = tail.prev;
removeNode(lru);
return lru;
}
}
// Time: O(1) for get, put, remove
// Space: O(capacity)
Cache-Aside Pattern
// Application manages cache
V getData(K key) {
// 1. Try cache first
V value = cache.get(key);
if (value != null) {
return value; // Cache hit
}
// 2. Cache miss - query database
value = database.get(key);
if (value != null) {
// 3. Populate cache
cache.put(key, value, TTL_SECONDS);
}
return value;
}
void updateData(K key, V value) {
// 1. Update database
database.put(key, value);
// 2. Invalidate or update cache
cache.remove(key); // Option 1: Invalidate
// cache.put(key, value, TTL_SECONDS); // Option 2: Update
}
Cache Warming
// Preload cache with hot data on startup
class CacheWarmer {
void warmCache() {
logger.info("Starting cache warming...");
// 1. Get top 10K most accessed keys from logs/analytics
List hotKeys = analyticsService.getTopKeys(10000);
// 2. Load from database in parallel
ExecutorService executor = Executors.newFixedThreadPool(100);
for (String key : hotKeys) {
executor.submit(() -> {
Object value = database.get(key);
if (value != null) {
cache.put(key, value, DEFAULT_TTL);
}
});
}
executor.shutdown();
executor.awaitTermination(5, TimeUnit.MINUTES);
logger.info("Cache warming complete");
}
}
// Run on application startup
@PostConstruct
void init() {
cacheWarmer.warmCache();
}
Handling Cache Failures
// Graceful degradation when cache is down
V getData(K key) {
try {
V value = cache.get(key);
if (value != null) return value;
} catch (CacheException e) {
logger.error("Cache error, falling back to DB", e);
// Continue to database
}
// Fallback to database
return database.get(key);
}
// Circuit breaker pattern
class CacheClient {
private CircuitBreaker circuitBreaker = new CircuitBreaker(
failureThreshold = 5,
timeout = 60000 // 1 minute
);
V get(K key) {
if (circuitBreaker.isOpen()) {
// Cache unhealthy - skip
return null;
}
try {
return cache.get(key);
} catch (Exception e) {
circuitBreaker.recordFailure();
throw e;
}
}
}
Replication for High Availability
// Redis master-replica setup
Master (write)
├─→ Replica 1 (read)
├─→ Replica 2 (read)
└─→ Replica 3 (read)
// Write to master
cache.set("user:123", userData);
// Read from replicas (eventually consistent)
userData = cache.get("user:123");
// Failover: If master dies, promote replica
// Redis Sentinel handles automatic failover
Best Practices:
- Set appropriate TTLs: Avoid stale data (hours for static, minutes for dynamic)
- Cache negative results: Prevent repeated DB queries for non-existent keys (short TTL)
- Monitor hit rate: Aim for >80% hit rate; tune capacity/TTL if lower
- Stampeding herd: When cached value expires, use locking so only one thread reloads
- Cache keys naming: Use consistent prefix format: "entity:id" (e.g., "user:123")