Twilio Architecture Deep Dive

Product Overviews, Kafka Infrastructure, and System Architecture Diagrams

Twilio Segment: Customer Data Platform CDP

What is Segment?

One-liner: Segment collects customer data from every touchpoint (web, mobile, server), unifies it into complete customer profiles, and activates it across 700+ downstream tools in real-time.

The Problem It Solves: Without Segment, companies have fragmented customer data across dozens of tools (analytics, email, ads, CRM). Each tool has incomplete data. Marketing sends irrelevant emails. Support doesn't know the customer's history. Ads target people who already converted.

How Segment Works: The Data Flow

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ SEGMENT CDP ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ SOURCES │ │ │ │ │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ │ │ Web │ │ Mobile │ │ Server │ │ Cloud │ │Warehouse│ │ │ │ │ │Analytics│ │iOS/Andr │ │ Node.js │ │ Stripe │ │Snowflake│ │ │ │ │ │ .js │ │ SDKs │ │ Python │ │Salesforce│ │ BigQuery│ │ │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ │ │ │ │ │ │ │ │ │ └────────┼────────────┼────────────┼────────────┼────────────┼────────────────┘ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ TRACKING API (TAPI) │ │ │ │ Go servers • 800K RPS • 30ms • 99.9999% │ │ │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ Shard 1 Shard 2 Shard 3 Shard N │ │ │ │ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │ │ │ │ │ NLB │ │ NLB │ │ NLB │ │ NLB │ │ │ │ │ │ │ │ ALB │ │ ALB │ │ ALB │ │ ALB │ │ │ │ │ │ │ │NSQD │ │NSQD │ │NSQD │ │NSQD │ ← Local buffer │ │ │ │ │ │ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └──────┼────────────┼────────────┼────────────┼───────────────────────┘ │ │ │ └─────────┼────────────┼────────────┼────────────┼───────────────────────────┘ │ │ │ │ │ │ │ │ ▼ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ KAFKA BACKBONE │ │ │ │ Primary Cluster ◄────► Secondary Cluster (Multi-tier failover) │ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ Partitioning by messageId → Same ID always same consumer │ │ │ │ │ └────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────┼────────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌─────────────────┐ ┌─────────────────────┐ ┌─────────────────┐ │ │ │ DEDUPLICATION │ │ IDENTITY RESOLUTION │ │ TRANSFORMATION │ │ │ │ │ │ │ │ │ │ │ │ RocksDB local │ │ DynamoDB Global │ │ Custom JSON │ │ │ │ 60B keys │ │ Tables for ms │ │ parser (zero- │ │ │ │ 1.5TB disk │ │ resolution │ │ allocation) │ │ │ │ 4-week window │ │ Graph-based merge │ │ │ │ │ │ Bloom filters │ │ │ │ │ │ │ └─────────────────┘ └─────────────────────┘ └─────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ UNIFIED PROFILES │ │ │ │ ┌───────────────────────────────────────────────────────────────────┐ │ │ │ │ │ user_id: "john@acme.com" │ │ │ │ │ │ anonymous_ids: ["abc123", "xyz789"] │ │ │ │ │ │ traits: { name: "John", plan: "enterprise", ltv: 50000 } │ │ │ │ │ │ events: [page_viewed, product_added, checkout_started, ...] │ │ │ │ │ │ computed_traits: { predicted_churn_risk: 0.12, ... } │ │ │ │ │ └───────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────┼────────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ CENTRIFUGE │ │ │ │ (Reliable delivery to 700+ destinations) │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ Directors (80-300 at peak) │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ MySQL/RDS as job queue │ │ │ │ │ │ │ │ • jobs table (immutable, append-only) │ │ │ │ │ │ │ │ • job_state_transitions table │ │ │ │ │ │ │ │ • 4-hour retry window, exponential backoff │ │ │ │ │ │ │ │ • S3 archival for undelivered │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ DESTINATIONS │ │ │ │ │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ │ │Analytics│ │Warehouse│ │ Ads │ │ Email │ │ CRM │ │ A/B │ │ │ │ │ │Google │ │Snowflake│ │Meta Ads │ │ Braze │ │Salesforce│ │Amplitude│ │ │ │ │ │Mixpanel │ │BigQuery │ │Google │ │Iterable │ │HubSpot │ │Optimizly│ │ │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ │ │ │ Cloud Mode: Segment servers translate & send (smaller page load) │ │ │ │ Device Mode: Direct API calls from client (for tools requiring it) │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Why Segment Matters to Twilio

1. Data for Personalization

Segment provides the customer context that makes Twilio communications relevant. "Welcome back, John!" vs "Dear Customer".

2. Channel Orchestration

Segment's Journeys can trigger Twilio SMS/Email/WhatsApp based on real-time customer behavior (cart abandonment → instant SMS).

3. AI Foundation

AI needs data. Segment's unified profiles feed ML models for predictive traits, churn scoring, and audience targeting.

Scale Metrics

400K
Events/Second
25K+
Companies Using
700+
Integrations
6T
Rows/Month to Warehouses
99.9999%
API Availability
#1
IDC CDP Market Share (4 years)

Interview Talking Point

"Segment solves the fragmented customer data problem. Companies have dozens of tools—analytics, email, ads, CRM—each with incomplete customer data. Segment sits at the center: it collects events from every touchpoint, resolves identity across devices and sessions, builds unified profiles, and then activates that data to 700+ downstream tools in real-time.

For Twilio, Segment is the data layer that makes communications intelligent. Without it, you're sending 'Dear Customer' emails. With it, you know the customer's name, their purchase history, their predicted churn risk, and you can trigger an SMS the moment they abandon their cart.

The technical architecture is impressive—400K events per second through a shard-based Tracking API, Kafka backbone for durability, RocksDB for 60-billion-key deduplication, and Centrifuge for reliable delivery to destinations with 4-hour retry windows."

AI Identity Thesis: The Stytch Acquisition VERIFIED

Is the AI Identity Thesis Valid?

Yes, definitively. The research confirms that AI agent authentication is a real, unsolved problem that traditional identity systems (OAuth, SAML) weren't designed for. Multiple sources validate this:

  • Cloud Security Alliance (CSA) has published a framework specifically for "Agentic AI Identity & Access Management"
  • AWS launched "Amazon Bedrock AgentCore Identity" in 2025 for the same purpose
  • Anthropic's MCP protocol mandates OAuth 2.1 for agent authorization
  • Academic research (arXiv papers) proposes zero-trust frameworks for AI agents
  • Orca Security reports non-human identities outnumber humans 50:1 (projected 80:1 by 2027)

Why Traditional Auth Fails for AI Agents

Aspect Human Users AI Agents Problem
Session Duration Persistent (hours/days) Ephemeral (seconds/minutes) OAuth assumes persistent sessions
Consent Interactive approval Delegated, autonomous No human in the loop for consent
Access Scope Coarse-grained roles Task-specific, granular Agents need fine-grained, ephemeral tokens
Multi-Agent Single user Chains of agents calling agents Delegation chains, trust propagation
Revocation Manual by user Real-time, risk-based Need instant revocation on suspicious behavior

What Stytch Brings

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ STYTCH: AI AGENT AUTHENTICATION │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ MODEL CONTEXT PROTOCOL (MCP) │ │ │ │ (Anthropic's open standard for AI tools) │ │ │ │ │ │ │ │ Claude ◄────────────► MCP Client ◄────────────► MCP Server │ │ │ │ ChatGPT (Your App) │ │ │ │ Custom LLM │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ STYTCH CONNECTED APPS │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ OAuth 2.1 + OIDC for Agents │ │ │ │ │ │ • Dynamic client registration │ │ │ │ │ │ • Scoped access tokens (read:customer, create:ticket, etc.) │ │ │ │ │ │ • Human-in-the-loop step-up authentication │ │ │ │ │ │ • Token lifecycle management (issue, validate, revoke) │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ Agent-Specific Features │ │ │ │ │ │ • Ephemeral, task-scoped credentials │ │ │ │ │ │ • Delegation chains for multi-agent workflows │ │ │ │ │ │ • Real-time revocation tied to risk signals │ │ │ │ │ │ • Audit trails for agent actions │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ INTEGRATION WITH TWILIO │ │ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ │ │ Twilio Verify │ │ Twilio Channels │ │ Twilio Segment │ │ │ │ │ │ (Human 2FA) │ │ (Voice,SMS,Email)│ │ (User Profiles) │ │ │ │ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ │ │ │ │ │ └──────────────────────┼──────────────────────┘ │ │ │ │ ▼ │ │ │ │ ┌──────────────────────────────────────────────┐ │ │ │ │ │ UNIFIED IDENTITY LAYER │ │ │ │ │ │ Humans + AI Agents + Bot Detection │ │ │ │ │ │ "Distinguish humans, trusted agents, │ │ │ │ │ │ and rogue agents" │ │ │ │ │ └──────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Stytch's Current Capabilities

MCP Integration

  • Native support for Claude, ChatGPT connectors
  • OAuth 2.1 as mandated by MCP spec
  • Dynamic client registration for new agents
  • Drop-in consent UI for user approval
  • Org-level admin policy controls

Agent-Ready Auth Primitives

  • Scoped tokens (read:customers, write:tickets, etc.)
  • Human-in-the-loop step-up for sensitive operations
  • Token lifecycle: issue, validate, revoke
  • Free for first 10K active users and agents

Strategic Rationale

Why Twilio + Stytch Makes Sense:

  1. Channel-Aware Identity: Twilio's channels (Voice, SMS, Email) + Stytch's auth = verification, step-up, and recovery that works worldwide across channels.
  2. Pre-Market Positioning: AI agents are nascent. Owning the identity layer before the market matures creates lock-in.
  3. Developer Trust: Twilio's 10M developers already trust the platform. Stytch extends that trust to the agent ecosystem.
  4. Competitive Moat: Auth0 (Okta) doesn't have communications. Twilio now has communications + identity.

Interview Talking Point

"The Stytch acquisition validates a real market need. Traditional OAuth was designed for humans—persistent sessions, interactive consent, coarse-grained roles. AI agents are fundamentally different: ephemeral, autonomous, operating in multi-agent chains where one agent calls another.

The research confirms this isn't hype. AWS launched Bedrock AgentCore Identity. The Cloud Security Alliance published an Agentic AI IAM framework. Anthropic's MCP protocol mandates OAuth 2.1 for agent authorization. Non-human identities already outnumber humans 50:1 in enterprise environments.

Stytch brings MCP-ready OAuth, scoped tokens, human-in-the-loop step-up, and the primitives for agent-to-agent delegation. Combined with Twilio's channels for out-of-band verification and Segment's user profiles, Twilio can offer a unified identity layer that distinguishes humans from trusted agents from rogue agents. That's a defensible moat as AI agents proliferate."

Kafka as Backbone & Centrifuge System Deep Dive

How Kafka is Used

Kafka is the durable, ordered message backbone for Twilio Segment. It serves three critical functions:

  1. Durability: Data persisted across availability zones—never lost, even if consumers crash
  2. Ordering: Messages partitioned by ID ensure same-ID messages go to same consumer
  3. Replay: Consumers can replay from any offset for recovery or reprocessing

Kafka Architecture in Segment

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ KAFKA AT TWILIO SEGMENT │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ MULTI-TIER KAFKA FAILOVER │ │ │ │ │ │ │ │ Each TAPI shard has: │ │ │ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │ │ │ │ PRIMARY CLUSTER │◄──►│ SECONDARY CLUSTER │ │ │ │ │ │ (Same AZ) │ │ (Different AZ) │ │ │ │ │ └──────────────────────┘ └──────────────────────┘ │ │ │ │ │ │ │ │ │ │ ▼ ▼ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ FAILOVER LOGIC (in Replicated service) │ │ │ │ │ │ • Monitor broker health │ │ │ │ │ │ • Switch to secondary if primary degrades beyond threshold │ │ │ │ │ │ • Auto-revert when primary recovers │ │ │ │ │ │ • Dynamic routing via Kubernetes ConfigMaps │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ DEDUPLICATION ARCHITECTURE │ │ │ │ │ │ │ │ ┌────────────────┐ │ │ │ │ │ Kafka Input │──────► Partition by messageId │ │ │ │ │ Topic │ (same ID → same consumer) │ │ │ │ └────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ DEDUP WORKER (Go) │ │ │ │ │ │ │ │ │ │ │ │ 1. Read message from input topic │ │ │ │ │ │ 2. Query RocksDB: "Have I seen this messageId?" │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ RocksDB (embedded, local EBS) │ │ │ │ │ │ │ │ • 60 billion keys │ │ │ │ │ │ │ │ • 1.5 TB on disk │ │ │ │ │ │ │ │ • 4-week deduplication window │ │ │ │ │ │ │ │ • Bloom filters: "definitely not seen" fast path │ │ │ │ │ │ │ │ • LSM tree: append-only, efficient compaction │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ 3. If NEW: add to RocksDB, publish to output topic │ │ │ │ │ │ 4. If DUPLICATE: discard, update offset only │ │ │ │ │ │ │ │ │ │ │ └────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌────────────────┐ │ │ │ │ │ Kafka Output │──────► To downstream processing │ │ │ │ │ Topic │ (transformations, destinations) │ │ │ │ └────────────────┘ │ │ │ │ │ │ │ │ KEY INSIGHT: Kafka output topic is the SOURCE OF TRUTH │ │ │ │ On restart, worker reads output topic first to rebuild RocksDB state │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Deep Dive: RocksDB as a High-Performance Cache

The Key Question: If Kafka is the source of truth, why do we need RocksDB?

Answer: Kafka is source of truth for correctness/recovery. RocksDB is the performance index for real-time lookups.

To deduplicate, you need to answer "Have I seen this messageId before?" for every incoming message. Consider the alternatives:

Approach Why It Doesn't Work
Scan Kafka output topic Way too slow - billions of messages, sequential scan
Keep all IDs in memory 60 billion keys × ~40 bytes = ~2.4TB RAM
Query remote DB (DynamoDB) Network latency on every message kills throughput
RocksDB (embedded) Local disk, no network hop, Bloom filters for fast "not seen" path

RocksDB vs Traditional Cache

Aspect Traditional Cache (Redis/Memcached) RocksDB Here
Location Remote (network hop) Embedded (local disk)
Durability Ephemeral (lost on restart) Persistent (EBS-backed)
Cache miss Query source of truth Never - it has ALL data for this partition
Rebuild Cold start, gradual population Read Kafka output topic once on startup
Failover Need replica cluster Just reattach EBS volume

Why Kafka is Still "Source of Truth"

For crash recovery and consistency:

Normal operation:
1. Read message from Kafka INPUT topic
2. Query RocksDB: "seen this ID?"
3. If new: write to RocksDB, publish to Kafka OUTPUT topic

Crash scenario:
- Worker writes to RocksDB, crashes before publishing to Kafka OUTPUT
- OR: Worker publishes to Kafka OUTPUT, crashes before writing to RocksDB

Recovery:
- On restart, worker reads Kafka OUTPUT topic first
- Rebuilds RocksDB state from what's actually been published
- Now RocksDB and Kafka OUTPUT are consistent

Kafka OUTPUT is the commit point. If it's in Kafka OUTPUT, it's been processed. RocksDB is just a fast cache that can be rebuilt.

Why Deduplication is Needed

Mobile clients retry on network failure, creating duplicates. At scale, even 0.6% duplicates cause real problems:

Why They Switched from Memcached

"We no longer have a set of large Memcached instances which require failover replicas. Instead we use embedded RocksDB databases which require zero coordination."

  • No network hop: Memcached ~1ms, RocksDB local ~0.01ms
  • No coordination: No distributed cache invalidation
  • No failover replicas: Cheaper, simpler ops
  • Bloom filters: For new messages, often zero disk reads
┌───────────────────────────────────────────────────────────────────────────────────────┐ │ SCALE NUMBERS │ │ │ │ │ │ │ • Nearly 1 million messages/second through Kafka │ │ │ │ • 0.6% of events are duplicates (from mobile retry) │ │ │ │ • 200 billion messages processed in first 3 months │ │ │ │ • 100x improvement over previous Memcached-based approach │ │ │ │ • Zero coordination required (embedded RocksDB vs Memcached replicas) │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Centrifuge: Reliable Delivery to External APIs

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ CENTRIFUGE ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ PROBLEM: Deliver to 700+ third-party APIs where "dozens are failing at any time" │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ WHY NOT TRADITIONAL QUEUES? │ │ │ │ │ │ │ │ ✗ Single Queue: One slow endpoint backs up everything │ │ │ │ ✗ Per-Destination Queue: Big customers dominate, block small customers │ │ │ │ ✗ Per-Source Queue: 42K sources × 2.1 destinations = 88K queues │ │ │ │ (Kafka/RabbitMQ/NSQ can't handle this cardinality) │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ SOLUTION: DATABASE-AS-QUEUE │ │ │ │ │ │ │ │ MySQL/RDS provides flexibility that queues don't: │ │ │ │ "Change QoS by running a single SQL statement" │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ JOBS TABLE │ │ │ │ │ │ • Immutable, append-only │ │ │ │ │ │ • KSUID primary key (k-sortable by timestamp, globally unique) │ │ │ │ │ │ • Stores: payload, endpoint, headers, expiration │ │ │ │ │ │ • Single index on KSUID (no expensive updates) │ │ │ │ │ │ • Median payload: ~5KB, max: 750KB │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ JOB_STATE_TRANSITIONS TABLE │ │ │ │ │ │ • Also immutable (no updates, ever) │ │ │ │ │ │ • States: awaiting_scheduling → executing → succeeded/discarded │ │ │ │ │ │ → awaiting_retry → archiving → archived │ │ │ │ │ │ • Tracks retry count with exponential backoff │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ DIRECTORS │ │ │ │ (80-300 running at peak load) │ │ │ │ │ │ │ │ Each Director: │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ 1. Acquires exclusive DB lock via Consul session │ │ │ │ │ │ 2. Caches jobs in-memory for fast access │ │ │ │ │ │ 3. Accepts RPC requests, logs jobs, executes HTTP │ │ │ │ │ │ 4. On success (200): mark succeeded, expire from cache │ │ │ │ │ │ 5. On transient failure (5xx, timeout): mark awaiting_retry │ │ │ │ │ │ 6. On permanent failure (4xx): mark discarded │ │ │ │ │ │ 7. Scales horizontally based on CPU │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ JOBDB MANAGER │ │ │ │ │ │ │ │ • Dynamically provisions/retires databases matching compute scaling │ │ │ │ • Cycles JobDBs every ~30 minutes at ~100% fill │ │ │ │ • Uses TABLE DROP instead of random DELETEs (massive perf win) │ │ │ │ • Drains in-flight retry jobs before decommissioning │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ RETRY & ARCHIVAL │ │ │ │ │ │ │ │ • 4-hour retry window with exponential backoff │ │ │ │ • 1.5% of all messages succeed on retry (that's 163M extra events!) │ │ │ │ • Undelivered messages archived to S3 │ │ │ │ • March 2018: absorbed 85M events during 90-min third-party outage │ │ │ │ flushed entire queue in 30 min post-recovery │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ PRODUCTION SCALE │ │ │ │ │ │ │ │ • 400,000 outbound HTTP requests/second sustained │ │ │ │ • 2 million requests/second in load testing │ │ │ │ • 340 billion jobs executed monthly │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Why Database-as-Queue?

Approach Pros Cons
Kafka/RabbitMQ/NSQ Purpose-built for messaging Can't handle 88K queues, limited reordering
MySQL/RDS (Centrifuge) Flexible QoS via SQL, handles cardinality, reorder anytime Need careful schema design (immutable rows)

Key Design Principles

  1. Immutable rows: No updates, ever. Append-only design eliminates write contention.
  2. No JOINs: Per-job queries allow massive parallelization.
  3. Write-heavy, small working set: Cache in memory, expire immediately after delivery.
  4. TABLE DROP over DELETE: Cycle databases and drop tables instead of random deletes.

Deep Dive: Why Database-as-Queue?

The Core Problem: 88,000 Logical Queues

Segment has 42K sources sending to an average of 2.1 destinations each. That's 88K source×destination pairs, each needing fair scheduling and isolated failure handling.

Why Traditional Queues Fail

Approach What Happens Why It Fails
Single Queue All messages in one FIFO queue One slow/failing endpoint backs up everything. Google Analytics slow → Salesforce blocked.
Per-Destination Queue One queue for Google Analytics, one for Salesforce, etc. Big customers dominate. Walmart's 10M events/day blocks small startup's 1K events.
Per-Source×Destination Queue Separate queue for each source-destination pair 42K × 2.1 = 88K queues. Kafka/RabbitMQ/NSQ can't handle this cardinality.

The Database-as-Queue Insight

Key Quote: "Change QoS by running a single SQL statement."

With a database, you can reorder, prioritize, and schedule jobs with SQL flexibility. Need to deprioritize a failing destination? One UPDATE. Need to prioritize transactional over promotional? Add a WHERE clause. Queues don't give you that.

Immutable Rows Design

The secret to making MySQL work at queue scale: never UPDATE, only INSERT.

-- Jobs table: immutable after insert
CREATE TABLE jobs (
    ksuid CHAR(27) PRIMARY KEY,  -- K-Sortable globally unique ID
    payload BLOB,                 -- The actual data (median 5KB, max 750KB)
    endpoint VARCHAR(255),
    headers TEXT,
    expires_at TIMESTAMP
);

-- State transitions: also immutable
CREATE TABLE job_state_transitions (
    id BIGINT AUTO_INCREMENT PRIMARY KEY,
    job_ksuid CHAR(27),
    from_state ENUM('awaiting_scheduling','executing','awaiting_retry','archiving'),
    to_state ENUM('executing','succeeded','discarded','awaiting_retry','archived'),
    transitioned_at TIMESTAMP,
    retry_count INT DEFAULT 0
);

-- Finding jobs to execute: single SQL query
SELECT j.ksuid, j.payload, j.endpoint
FROM jobs j
JOIN (
    SELECT job_ksuid, MAX(id) as latest
    FROM job_state_transitions
    GROUP BY job_ksuid
) latest ON j.ksuid = latest.job_ksuid
JOIN job_state_transitions jst ON jst.id = latest.latest
WHERE jst.to_state = 'awaiting_scheduling'
  AND j.expires_at > NOW()
ORDER BY j.ksuid  -- KSUID is time-ordered!
LIMIT 1000;

Why Immutable Rows?

The TABLE DROP Trick

Normal queue systems do random DELETEs, which fragments indexes and requires expensive vacuuming.

Centrifuge's approach:

  1. Create a new JobDB every ~30 minutes
  2. New jobs go to the newest database
  3. Old databases drain their retry queues
  4. When a database is empty: DROP TABLE jobs; DROP TABLE job_state_transitions;

Result: Zero random deletes. DROP TABLE is instant regardless of table size.

Directors Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                        DIRECTOR LIFECYCLE                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. ACQUIRE LOCK                                                         │
│     └─► Consul session locks specific JobDB                              │
│         (Only one Director per database)                                 │
│                                                                          │
│  2. LOAD JOBS INTO MEMORY                                                │
│     └─► SELECT awaiting_scheduling jobs from JobDB                       │
│     └─► Cache in-memory for fast access                                  │
│                                                                          │
│  3. ACCEPT RPC REQUESTS                                                  │
│     └─► Upstream Kafka consumer calls Director to log new job            │
│     └─► INSERT into jobs table + job_state_transitions                   │
│                                                                          │
│  4. EXECUTE HTTP REQUESTS                                                │
│     └─► Pop job from in-memory queue                                     │
│     └─► Make HTTP call to external API                                   │
│     └─► Handle response:                                                 │
│         ├─► 200 OK: INSERT state → 'succeeded'                           │
│         ├─► 5xx/timeout: INSERT state → 'awaiting_retry'                 │
│         └─► 4xx: INSERT state → 'discarded'                              │
│                                                                          │
│  5. RETRY LOOP                                                           │
│     └─► Background thread scans awaiting_retry                           │
│     └─► Exponential backoff (1s, 2s, 4s, 8s, ... up to 4 hours)         │
│     └─► Re-queue for execution                                           │
│                                                                          │
│  6. ARCHIVAL                                                             │
│     └─► After 4 hours: INSERT state → 'archiving'                        │
│     └─► Write to S3 for later replay                                     │
│     └─► INSERT state → 'archived'                                        │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Scale: 80-300 Directors running at peak, scaled by CPU utilization

Real-World Validation

March 2018 Outage Test:

  • A major third-party destination went down for 90 minutes
  • Centrifuge absorbed 85 million events into retry queues
  • When destination recovered, flushed entire backlog in 30 minutes
  • No data lost, no manual intervention required

Interview Talking Point

"Kafka is the durable backbone for Segment—it ensures data is never lost by persisting across availability zones, provides ordering guarantees through message partitioning, and enables replay for recovery. Each shard has primary and secondary Kafka clusters with automatic failover.

The deduplication layer is clever: they partition by messageId so the same ID always goes to the same consumer, which narrows the search space. Each worker runs an embedded RocksDB with Bloom filters—for new messages, the filter returns 'definitely not seen' without hitting disk. They maintain 60 billion keys across 1.5TB with a 4-week dedup window.

Centrifuge is even more interesting. Traditional message queues can't handle their cardinality—42K sources times 2.1 destinations means 88K queues. They built a database-as-queue using MySQL with immutable, append-only tables. Directors acquire exclusive locks via Consul, cache jobs in memory, execute HTTP requests, and scale horizontally based on CPU. The key insight: TABLE DROP is dramatically faster than random DELETEs, so they cycle entire databases every 30 minutes. Result: 400K outbound requests/second sustained, 2M in load testing."

SMS Messaging Architecture

Message Flow: API to Carrier

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ TWILIO SMS MESSAGING PIPELINE │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ YOUR APPLICATION │ │ │ │ │ │ │ │ POST /Messages │ │ │ │ { "To": "+1555...", "From": "+1888...", "Body": "Hello!" } │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ TWILIO API LAYER │ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ 1. Authentication (API Key / Auth Token) │ │ │ │ │ │ 2. Validation (E.164 format, sender eligibility) │ │ │ │ │ │ 3. Compliance check (TrustHub verification status) │ │ │ │ │ │ 4. Rate limit check (concurrent request limits) │ │ │ │ │ └────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ ACCOUNT QUEUE │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ • FIFO ordering │ │ │ │ │ │ • 4-hour maximum queue time (then expires with 30001) │ │ │ │ │ │ • Rate limited by sender type: │ │ │ │ │ │ - Short Code: 100+ MPS │ │ │ │ │ │ - Toll-Free: 3 MPS (upgradeable) │ │ │ │ │ │ - 10DLC: Variable by trust score │ │ │ │ │ │ - Long Code: 1 MPS │ │ │ │ │ │ • Queue capacity = 4 hours × MPS × 60 × 60 │ │ │ │ │ │ (Short Code 100 MPS = 1,440,000 message segments) │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ SUPER NETWORK │ │ │ │ (4,800 carrier connections worldwide) │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ INTELLIGENT ROUTING │ │ │ │ │ │ • 900+ data points analyzed per message │ │ │ │ │ │ • 3.2 billion data points monitored daily │ │ │ │ │ │ • 4x redundant routes per destination (average) │ │ │ │ │ │ • Automatic failover if primary route degrades │ │ │ │ │ │ • Traffic Optimization Engine prioritizes by urgency │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ SMPP GATEWAY │ │ │ │ │ │ • Short Message Peer-to-Peer protocol │ │ │ │ │ │ • Connects to carrier SMSCs (Short Message Service Centers) │ │ │ │ │ │ • Handles PDU encoding, segmentation, concatenation │ │ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ CARRIER NETWORK │ │ │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ AT&T │ │ Verizon │ │ T-Mobile │ │ Vodafone │ │ │ │ │ │ SMSC │ │ SMSC │ │ SMSC │ │ SMSC │ │ │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ │ │ │ │ │ └────────────────┴────────────────┴────────────────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ 📱 End User Device │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────┐ │ │ │ DELIVERY FEEDBACK LOOP │ │ │ │ │ │ │ │ Carrier DLR ──► Status Callback Webhook ──► Your Application │ │ │ │ (Delivery (queued → sending → sent (Update CRM, │ │ │ │ Receipt) → delivered/failed) trigger next) │ │ │ │ │ │ │ │ Event Streams ──► Kinesis/S3 ──► Data Warehouse (Analytics) │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Enterprise Messaging Architecture (10-Step Pattern)

1Application Event Ingestion: Internal API with auth gates incoming requests
2Event Processing: Evaluate CDP data, determine eligibility, set routing
3Internal Queue: Rate-limit throughput, prioritize (transactional > promotional)
4Twilio API: REST over HTTPS with API Keys + Public Key Client Validation
5Account Routing: Map to appropriate subaccount based on sender
6Compliance Layer: TrustHub verifies carrier registration
7Super Network: Optimize carrier selection for delivery
8Status Callbacks: Real-time feedback on message status
9Inbound Handling: Webhooks for 2-way messaging, compliance keywords
10Data Pipeline: Engagement metrics back to CDP/CRM

Programmable Voice Architecture

┌─────────────────────────────────────────────────────────────────────────────────────┐ │ TWILIO VOICE ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ CONNECTION METHODS │ │ │ │ │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ │ │ PSTN │ │ WebRTC │ │ SIP │ │ │ │ │ │ (Phone │ │ (Browser/ │ │ (Enterprise│ │ │ │ │ │ Network) │ │ Mobile SDK)│ │ PBX) │ │ │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ │ │ │ └──────────┼──────────────────┼──────────────────┼───────────────────────────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ GLOBAL LOW LATENCY (GLL) EDGE │ │ │ │ (9 Edge Locations) │ │ │ │ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ Ashburn │ │ Dublin │ │Frankfurt │ │ Tokyo │ │ Sydney │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ São Paulo│ │Singapore │ │ Mumbai │ │ Oregon │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ • Auto-selects nearest edge based on call origin │ │ │ │ • Same edge for inbound/outbound legs (lowest latency) │ │ │ │ • Conferences placed where most participants located │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ SIGNALING & MEDIA SERVERS │ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ SIGNALING PATH │ │ │ │ │ │ │ │ │ │ │ │ Inbound Call ──► Webhook to your app ──► TwiML Response │ │ │ │ │ │ (, , , , , etc.) │ │ │ │ │ │ │ │ │ │ │ │ REST API ──► Initiate outbound call ──► Connect to TwiML │ │ │ │ │ │ │ │ │ │ │ └────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ┌────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ MEDIA PATH │ │ │ │ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Media Server │ │ │ │ │ │ │ │ • G.711 μ-law/A-law codecs (no transcoding to carriers) │ │ │ │ │ │ │ │ • SRTP encryption for secure media │ │ │ │ │ │ │ │ • Symmetric RTP for NAT traversal │ │ │ │ │ │ │ │ • STUN/TURN handled by Twilio (you don't need servers) │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ │ │ │ │ P2P Rooms: Twilio handles signaling, media goes direct │ │ │ │ │ │ Group Rooms: Twilio SFU (Selective Forwarding Unit) routes media │ │ │ │ │ │ │ │ │ │ │ └────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌──────────────────────────┼──────────────────────────┐ │ │ ▼ ▼ ▼ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ PSTN GATEWAY │ │ SIP INTERFACE │ │ BYOC TRUNK │ │ │ │ │ │ │ │ │ │ │ │ Connects to │ │ SIP Domain │ │ Bring Your Own │ │ │ │ carrier │ │ for PBX/SIP │ │ Carrier for │ │ │ │ networks │ │ clients │ │ existing #s │ │ │ │ │ │ │ │ │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────┐ │ │ │ CONVERSATIONRELAY (AI VOICE) │ │ │ │ │ │ │ │ Inbound Call ──► Speech-to-Text ──► WebSocket events ──► Your LLM │ │ │ │ (Deepgram, (text chunks) (OpenAI, │ │ │ │ Google, Claude, │ │ │ │ Amazon) Custom) │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ Text-to-Speech ◄──────────────────────────────────────── Response │ │ │ │ (ElevenLabs, │ │ │ │ Amazon Polly) │ │ │ │ │ │ │ │ Features: Interruption handling, HIPAA-eligible, tool calling │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────┘

Complete Twilio Platform Architecture

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ TWILIO PLATFORM (2025) │ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ DEVELOPER INTERFACE │ │ │ │ │ │ │ │ REST APIs │ SDKs (JS, Python, Java, Go, ...) │ TwiML │ Console │ CLI │ │ │ │ │ │ │ └────────────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────────────────────────┼────────────────────────────────────────────────────────┐ │ │ │ ▼ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ IDENTITY LAYER (Stytch + Verify) │ │ │ │ │ │ │ │ │ │ │ │ Human Auth │ AI Agent Auth │ MCP/OAuth 2.1 │ Fraud Guard │ │ │ │ │ │ (2FA, Passkeys) (Scoped tokens, (Claude, ChatGPT (Bot detection, │ │ │ │ │ │ delegation) connectors) rate limiting) │ │ │ │ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ └────────────────────────────────────────────┼────────────────────────────────────────────────────────┘ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ CUSTOMER DATA PLATFORM (Segment) │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ │ │ Sources │ │ Profiles │ │ Audiences │ │ Journeys │ │ Destinations │ │ │ │ │ │ (700+) │ │ (Unified) │ │ (ML-based) │ │ (Real-time) │ │ (700+) │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ Infrastructure: Kafka backbone • RocksDB dedup • Centrifuge delivery • 400K events/sec │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────────────────────────┼────────────────────────────────────────────────────────┐ │ │ │ ▼ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ AI / INTELLIGENCE LAYER │ │ │ │ │ │ │ │ │ │ │ │ ConversationRelay │ Conversational │ Agent Copilot │ Predictive │ │ │ │ │ │ (Voice AI Agents) Intelligence (Flex Contact Traits │ │ │ │ │ │ (Sentiment, Center) (ML) │ │ │ │ │ │ • LLM-agnostic Intent Detection) │ │ │ │ │ │ • HIPAA-eligible │ │ │ │ │ │ • Tool calling │ │ │ │ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ └────────────────────────────────────────────┼────────────────────────────────────────────────────────┘ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ COMMUNICATIONS LAYER (CPaaS) │ │ │ │ │ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ │ │ SMS │ │ Voice │ │ Email │ │ WhatsApp │ │ RCS │ │ Video │ │ Fax │ │ │ │ │ │ MMS │ │ (PSTN, │ │(SendGrid)│ │ Business │ │ Business │ │ (WebRTC) │ │ │ │ │ │ │ │ │ │ WebRTC, │ │ │ │ Calling │ │ Messaging│ │ │ │ │ │ │ │ │ │ │ │ SIP) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ │ │ │ │ ┌──────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ MESSAGING SERVICES │ │ │ │ │ │ • Sender Pools (Short Code, Toll-Free, 10DLC, Long Code) │ │ │ │ │ │ • Advanced Opt-Out │ │ │ │ │ │ • Sticky Sender (consistent caller ID per recipient) │ │ │ │ │ │ • Link Shortening & Tracking │ │ │ │ │ │ • Content API Templates │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌────────────────────────────────────────────┼────────────────────────────────────────────────────────┐ │ │ │ ▼ │ │ │ │ ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ TRUST & COMPLIANCE │ │ │ │ │ │ │ │ │ │ │ │ TrustHub │ Compliance Toolkit │ Branded Comms │ Data Residency │ │ │ │ │ │ (Carrier Reg) (AI violation detect) (RCS, BIMI) (US, EU, APAC) │ │ │ │ │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ └────────────────────────────────────────────┼────────────────────────────────────────────────────────┘ │ │ ▼ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ SUPER NETWORK │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │ INFRASTRUCTURE │ │ │ │ │ │ │ │ │ │ │ │ 4,800 Carrier │ 8 Edge Locations │ 3.2B Data Points │ Intelligent │ │ │ │ │ │ Connections (24 AZs) /Day Monitored Routing │ │ │ │ │ │ (900+ signals) │ │ │ │ │ │ │ │ │ │ │ │ 4x Redundant Routes │ 99.95% SLA │ Hybrid Cloud │ SMPP/SIP/PSTN │ │ │ │ │ │ Per Destination (99.999% CDP) (AWS + Dedicated) Gateways │ │ │ │ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ └─────────────────────────────────────────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Key Architectural Takeaways

Layered Design

  • Identity at top: All interactions (human + AI) authenticated first
  • Data in middle: Segment provides context for personalization
  • AI augments: Intelligence layer enhances but doesn't replace channels
  • Channels at edge: CPaaS as the delivery mechanism
  • Network as foundation: Super Network makes it all reliable

"One Twilio" Integration Points

  • Segment → SMS/Email: Event-triggered journeys fire channels
  • Stytch → Verify: Human 2FA + AI agent auth unified
  • ConversationRelay → Flex: AI handles, escalates to human
  • Segment → ConversationRelay: Profiles inform AI context
  • All channels → Segment: Engagement data feeds back to CDP