API Design & Versioning

REST Principles, Versioning Strategies, Breaking Changes, and API Governance at Scale

← Back to Study Guide

Quick Navigation

Why API Design Matters for Twilio DA

Twilio Context: Twilio's entire business is APIs. Developers integrate Twilio APIs into their applications—any breaking change directly impacts customer code. API design decisions live for years and affect millions of integrations.

The Stakes at Twilio Scale

Challenge Impact Twilio Example
Breaking changes Customer code breaks in production Changing SMS callback payload structure
Version proliferation Maintenance burden, confusion Supporting v1, v2, v3 simultaneously
Inconsistency Poor developer experience Different patterns across Voice vs Messaging APIs
Rate limiting Customer frustration, abuse prevention Protecting Super Network from traffic spikes
Documentation drift Incorrect implementations, support burden Outdated SDK examples

REST API Design Principles

Resource-Oriented Design

Core Principle: APIs expose resources (nouns), not actions (verbs). HTTP methods provide the verbs.

Good: Resource-Oriented

GET    /messages           # List messages
POST   /messages           # Create message
GET    /messages/{sid}     # Get specific message
PUT    /messages/{sid}     # Update message
DELETE /messages/{sid}     # Delete message

Bad: Action-Oriented

POST /sendMessage
POST /getMessage
POST /updateMessage
POST /deleteMessage
POST /listAllMessages

HTTP Methods (Verbs)

Method Purpose Idempotent? Safe? Request Body
GET Retrieve resource(s) Yes Yes No
POST Create resource No No Yes
PUT Replace resource entirely Yes No Yes
PATCH Partial update No* No Yes
DELETE Remove resource Yes No Optional

*PATCH can be idempotent if designed carefully (e.g., "set field to X" vs "increment field")

URL Structure Best Practices

# Hierarchical resources
GET /accounts/{accountSid}/messages
GET /accounts/{accountSid}/calls/{callSid}/recordings

# Twilio pattern: Account scoping
POST /2010-04-01/Accounts/{AccountSid}/Messages.json

# Filtering via query params (not path)
GET /messages?status=delivered&dateSent>=2024-01-01

# Pagination
GET /messages?pageSize=50&pageToken=xyz123

# Sorting
GET /messages?sort=dateSent:desc

Response Design

// Successful response (200 OK)
{
  "sid": "SM123456789",
  "accountSid": "AC987654321",
  "to": "+14155551234",
  "from": "+14155556789",
  "body": "Hello, World!",
  "status": "delivered",
  "dateCreated": "2024-01-15T10:30:00Z",
  "dateUpdated": "2024-01-15T10:30:05Z",
  "uri": "/2010-04-01/Accounts/AC987654321/Messages/SM123456789.json",
  "_links": {
    "self": "/messages/SM123456789",
    "account": "/accounts/AC987654321",
    "media": "/messages/SM123456789/media"
  }
}

// Collection response with pagination
{
  "messages": [...],
  "meta": {
    "page": 0,
    "pageSize": 50,
    "totalCount": 1234,
    "pageCount": 25
  },
  "pagination": {
    "firstPageUri": "/messages?pageSize=50",
    "nextPageUri": "/messages?pageSize=50&pageToken=abc123",
    "previousPageUri": null
  }
}

Naming Conventions

Element Convention Example
URLs lowercase, hyphens /phone-numbers
JSON fields camelCase dateCreated, accountSid
Query params camelCase ?pageSize=50
Resource names Plural nouns /messages, /calls
IDs Prefixed, unique SM..., CA..., AC...

2-Minute Answer: "What makes a good REST API?"

"A good REST API is resource-oriented—it exposes nouns, not verbs. Instead of 'sendMessage', you POST to /messages. Resources have stable URIs that can be bookmarked, cached, and linked. HTTP methods provide consistent semantics: GET is safe and idempotent, POST creates, PUT replaces, DELETE removes. Responses include hypermedia links so clients can navigate the API without hardcoding URLs. For consistency, I establish conventions early: camelCase for JSON, plural nouns for collections, prefixed IDs for debuggability. At scale, I add pagination with cursors instead of offsets—offset pagination breaks when data is added or removed between requests. The most important thing is predictability. A developer who's used one endpoint should be able to guess how another works. That means consistent naming, consistent error formats, and consistent behavior across the entire API surface."

Versioning Strategies

Versioning Approaches

Approach Example Pros Cons
URL Path /v1/messages Obvious, cacheable, easy routing Not RESTful (resource URI changes)
Query Param /messages?version=1 Optional, backward compatible Easy to forget, caching issues
Header Accept: application/vnd.twilio.v1+json Clean URLs, content negotiation Hidden, harder to test, no browser support
Date-based /2010-04-01/Messages Clear timeline, Twilio's approach Many versions accumulate
No versioning Evolve in place Simple, forces compatibility Hard to make breaking changes

Twilio's Date-Based Versioning

Twilio Pattern: Version by release date in URL path. Each version is a snapshot of the API at that time.
# Twilio API versions
/2010-04-01/Accounts/{AccountSid}/Messages.json  # Original version
/2020-03-15/Accounts/{AccountSid}/Messages.json  # Newer version

# Benefits:
# - Clear timeline of when features were added
# - Customer pins to specific date, knows exactly what they get
# - New versions only when breaking changes needed
# - Old versions maintained for years with deprecation warnings

Version Selection in Cell Architecture

┌─────────────────────────────────────────────────────────────────────────────┐ │ LANDING ZONE (API Gateway) │ │ │ │ Request: POST /2010-04-01/Accounts/AC123/Messages.json │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────┐ │ │ │ Version Router │ │ │ │ /2010-04-01/* → v1 handlers │ │ │ │ /2020-03-15/* → v2 handlers │ │ │ │ /2024-01-01/* → v3 handlers │ │ │ └─────────────────────────────────────────┘ │ │ │ │ └────────────────────┼────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────┐ │ CELL │ │ │ │ Cell runs ALL version handlers simultaneously │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ v1 Handler │ │ v2 Handler │ │ v3 Handler │ │ │ │ (2010-04-01)│ │ (2020-03-15)│ │ (2024-01-01)│ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ │ │ │ └────────────────┼────────────────┘ │ │ ▼ │ │ ┌───────────────┐ │ │ │ Core Service │ (Version-agnostic business logic) │ │ │ Layer │ │ │ └───────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ Key Insight: Version translation happens at API boundary, not deep in the stack

Implementing Version Handlers

// Version-specific request/response transformers
class MessageHandlerV1 {
  // 2010-04-01: Original format
  async create(req) {
    const input = this.transformRequest(req.body);
    const result = await messageService.create(input);  // Core service
    return this.transformResponse(result);
  }

  transformRequest(body) {
    return {
      to: body.To,           // V1 uses PascalCase
      from: body.From,
      body: body.Body,
    };
  }

  transformResponse(message) {
    return {
      sid: message.id,
      To: message.to,        // V1 returns PascalCase
      From: message.from,
      Body: message.body,
      Status: message.status,
      DateCreated: message.createdAt.toISOString(),
    };
  }
}

class MessageHandlerV2 {
  // 2020-03-15: Modern format
  async create(req) {
    const input = this.transformRequest(req.body);
    const result = await messageService.create(input);  // Same core service!
    return this.transformResponse(result);
  }

  transformRequest(body) {
    return {
      to: body.to,           // V2 uses camelCase
      from: body.from,
      body: body.body,
      contentType: body.contentType,  // V2 adds new fields
      validityPeriod: body.validityPeriod,
    };
  }

  transformResponse(message) {
    return {
      sid: message.id,
      to: message.to,        // V2 returns camelCase
      from: message.from,
      body: message.body,
      status: message.status,
      dateCreated: message.createdAt,
      dateUpdated: message.updatedAt,
      _links: {              // V2 adds hypermedia
        self: `/messages/${message.id}`,
        media: `/messages/${message.id}/media`,
      },
    };
  }
}

2-Minute Answer: "How do you approach API versioning?"

"I version in the URL path with a date-based scheme—like Twilio's /2010-04-01/Messages. The date makes it clear when a version was released and what behavior to expect. I only create new versions for breaking changes; additive changes go into the current version. The version is resolved at the API Gateway layer—the Landing Zone in our cell architecture. Each cell runs handlers for all supported versions, but the business logic is version-agnostic. Version handlers just transform requests and responses at the boundary. This means I can add a new version by adding a new transformer, not rewriting business logic. For deprecation, I give customers at least 12 months notice, add deprecation headers to responses, track version usage metrics, and reach out to heavy users of old versions before sunset. The goal is zero surprise breakage."

Breaking vs Non-Breaking Changes

What Breaks Clients?

Breaking Changes

  • Removing a field from response
  • Renaming a field
  • Changing field type (string → int)
  • Adding required request field
  • Removing an endpoint
  • Changing URL structure
  • Changing error codes
  • Changing authentication method
  • Reducing rate limits
  • Changing enum values

Non-Breaking Changes

  • Adding new optional field to request
  • Adding new field to response
  • Adding new endpoint
  • Adding new enum value
  • Adding new HTTP method to resource
  • Increasing rate limits
  • Adding new query parameters
  • Adding new headers
  • Relaxing validation rules
  • Bug fixes (if documented)

The Robustness Principle

Postel's Law: "Be conservative in what you send, be liberal in what you accept."
// Server: Liberal in accepting requests
app.post('/messages', (req, res) => {
  const { to, from, body, ...unknownFields } = req.body;

  if (Object.keys(unknownFields).length > 0) {
    logger.info('Ignoring unknown fields', { fields: Object.keys(unknownFields) });
    // Don't reject! Just ignore.
  }

  // Process with known fields only
  const message = await createMessage({ to, from, body });
  res.json(message);
});

// Client: Liberal in parsing responses
function parseMessage(response) {
  // Only extract fields we know about
  // Ignore any new fields the server adds
  return {
    sid: response.sid,
    to: response.to,
    from: response.from,
    body: response.body,
    status: response.status,
    // Don't fail if response has extra fields!
  };
}

Additive-Only Evolution

V1 Response: V2 Response (additive): { { "sid": "SM123", "sid": "SM123", "to": "+1555...", "to": "+1555...", "from": "+1555...", "from": "+1555...", "body": "Hello", "body": "Hello", "status": "sent" "status": "sent", } "segments": 1, ← NEW (optional) "price": "0.0075", ← NEW (optional) "errorCode": null ← NEW (optional) } ✅ V1 clients continue working (ignore new fields) ✅ V2 clients get new features ✅ No version bump needed

Expand-Contract for Breaking Changes

Pattern: When you must make a breaking change, do it in three phases over 6-12 months.
Phase 1: EXPAND (Month 0) ───────────────────────── Add new field alongside old field. Support both. Request can include: { "phone": "+1555..." } ← Old field (still works) { "phoneNumber": "+1555..." } ← New field (preferred) { "phone": "...", "phoneNumber": "..." } ← Both (new wins) Response includes both: { "phone": "+1555...", "phoneNumber": "+1555..." } Phase 2: MIGRATE (Month 3-9) ──────────────────────────── • Add deprecation warning header when old field used • Update documentation to recommend new field • Update SDKs to use new field • Track usage of old vs new field • Reach out to customers still using old field X-Twilio-Deprecation: "phone" field deprecated, use "phoneNumber" Phase 3: CONTRACT (Month 12) ──────────────────────────── • Stop accepting old field (return 400 Bad Request) • Stop returning old field in response • Old field removed from documentation { "phoneNumber": "+1555..." } ← Only new field

Feature Flags for Gradual Rollout

// Feature flag controls new behavior
async function sendMessage(request, account) {
  const useNewValidation = await featureFlags.isEnabled(
    'sms.strict_e164_validation',
    { accountId: account.id }
  );

  if (useNewValidation) {
    // New stricter validation (potential breaking change)
    if (!isStrictE164(request.to)) {
      throw new ValidationError('Phone must be E.164 format: +1234567890');
    }
  } else {
    // Old lenient validation
    request.to = normalizeToE164(request.to);
  }

  return await messageService.send(request);
}

// Rollout strategy:
// Week 1: 1% of accounts (beta testers)
// Week 2: 10% of accounts
// Week 4: 50% of accounts
// Week 6: 100% of accounts
// Week 8: Remove flag, new behavior is default

2-Minute Answer: "How do you handle breaking changes?"

"The best breaking change is the one you don't make. I follow additive-only evolution—new fields, new endpoints, new optional parameters. Clients should ignore unknown fields in responses and servers should ignore unknown fields in requests. When I absolutely must make a breaking change, I use the expand-contract pattern. First, I add the new field alongside the old one and support both for 6+ months. During this time, I add deprecation warnings to responses, update documentation, and track which customers still use the old pattern. I reach out directly to heavy users. Only after the migration period do I remove the old field. For risky changes, I use feature flags to roll out gradually—1%, then 10%, then 50%, watching error rates at each step. The goal is zero surprises. A customer should never wake up to broken code because we changed an API."

Deprecation & Sunset Strategy

Deprecation Lifecycle

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Active │────►│ Deprecated │────►│ Sunset │────►│ Removed │ │ │ │ │ │ │ │ │ │ Full support│ │ Works, but │ │ 30 days to │ │ Returns 410 │ │ Documented │ │ warnings │ │ removal │ │ Gone │ │ │ │ 6-12 months │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘

Deprecation Signals

Signal Implementation Purpose
HTTP Header Deprecation: true
Sunset: Sat, 01 Jun 2025 00:00:00 GMT
Machine-readable, loggable
Response Field "_deprecation": { "message": "...", "sunset": "..." } Visible in response body
Documentation Prominent "DEPRECATED" badge Developers reading docs
SDK Warnings console.warn('Method deprecated...') IDE and runtime visibility
Email/Dashboard Direct notification to affected accounts Proactive communication
// Response with deprecation headers
HTTP/1.1 200 OK
Content-Type: application/json
Deprecation: true
Deprecation-Date: Wed, 01 Jan 2025 00:00:00 GMT
Sunset: Wed, 01 Jul 2025 00:00:00 GMT
Link: <https://docs.twilio.com/migration/v2>; rel="deprecation"

{
  "sid": "SM123456789",
  "Status": "sent",
  "_deprecation": {
    "warning": "API version 2010-04-01 is deprecated",
    "sunsetDate": "2025-07-01T00:00:00Z",
    "migrationGuide": "https://docs.twilio.com/migration/v2",
    "replacement": "/2024-01-01/Messages"
  }
}

Sunset Communication Timeline

Timeline Action Channel
T-12 months Announce deprecation Blog, changelog, email to all users
T-9 months Add deprecation headers/warnings API responses
T-6 months Direct outreach to heavy users Email, account manager contact
T-3 months Sunset reminder Email, in-app notification
T-1 month Final warning All channels, dashboard banner
T-0 Begin returning 410 Gone API response
T+1 month Remove from documentation Docs site

Measuring Deprecation Progress

// Track deprecated endpoint usage
SELECT
  api_version,
  endpoint,
  COUNT(DISTINCT account_id) as unique_accounts,
  COUNT(*) as total_requests,
  DATE_TRUNC('week', timestamp) as week
FROM api_requests
WHERE endpoint IN (SELECT endpoint FROM deprecated_endpoints)
GROUP BY api_version, endpoint, week
ORDER BY week DESC;

// Alert if deprecated usage isn't declining
// Goal: 50% reduction each quarter until sunset

2-Minute Answer: "How do you deprecate an API?"

"Deprecation is a 12-month process minimum. At announcement, I add the Deprecation and Sunset headers to every response from that endpoint. These are machine-readable—customers can detect them in CI/CD and set up alerts. I update documentation with prominent warnings and migration guides. At the 6-month mark, I pull usage data to identify customers still on the old API. The top 20 accounts by usage get direct outreach—often they have valid reasons or need help migrating. I track weekly metrics: unique accounts, total requests, error rates. If migration isn't happening fast enough, I extend the timeline or allocate engineering to help. At sunset, the endpoint returns 410 Gone with a body explaining what happened and where to go. But honestly, if you've communicated well, no one should be surprised. The goal is that by sunset day, usage is already near zero because everyone migrated months ago."

Error Handling

HTTP Status Codes

Code Meaning When to Use Retry?
400 Bad Request Invalid input, validation failure No (fix request)
401 Unauthorized Missing or invalid credentials No (fix auth)
403 Forbidden Valid auth, but not allowed No (permission issue)
404 Not Found Resource doesn't exist No
409 Conflict Optimistic lock failure, duplicate Maybe (re-fetch, retry)
422 Unprocessable Entity Valid syntax, invalid semantics No (fix data)
429 Too Many Requests Rate limited Yes (after delay)
500 Internal Server Error Unexpected server failure Yes (with backoff)
502 Bad Gateway Upstream service failure Yes (with backoff)
503 Service Unavailable Temporary overload/maintenance Yes (check Retry-After)

Error Response Format

// Twilio-style error response
{
  "code": 21211,
  "message": "The 'To' phone number is not a valid phone number",
  "status": 400,
  "moreInfo": "https://www.twilio.com/docs/errors/21211",
  "details": {
    "field": "to",
    "value": "not-a-phone",
    "reason": "Must be E.164 format: +1234567890"
  }
}

// Multiple validation errors
{
  "code": 20001,
  "message": "Validation failed",
  "status": 400,
  "errors": [
    {
      "field": "to",
      "code": "invalid_format",
      "message": "Must be E.164 format"
    },
    {
      "field": "body",
      "code": "too_long",
      "message": "Body exceeds 1600 characters"
    }
  ]
}

Error Code Taxonomy

Twilio Pattern: Numeric error codes with ranges indicating category.
Range Category Example
10xxx Account errors 10001: Account not found
20xxx Request validation 20003: Missing required parameter
21xxx Phone number errors 21211: Invalid phone number
30xxx Message delivery 30003: Unreachable destination
50xxx Internal errors 50001: Internal server error

Retry Guidance

// Include retry guidance in response headers
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1699900030

{
  "code": 20429,
  "message": "Rate limit exceeded",
  "retryAfter": 30,
  "rateLimitInfo": {
    "limit": 100,
    "remaining": 0,
    "resetAt": "2024-01-15T10:30:30Z"
  }
}

// Client retry logic
async function callAPIWithRetry(request, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await makeRequest(request);
    } catch (error) {
      if (error.status === 429) {
        const retryAfter = error.headers['retry-after'] || 30;
        await sleep(retryAfter * 1000);
        continue;
      }
      if (error.status >= 500) {
        const backoff = Math.pow(2, attempt) * 1000;  // Exponential backoff
        await sleep(backoff);
        continue;
      }
      throw error;  // Don't retry 4xx errors
    }
  }
  throw new Error('Max retries exceeded');
}

Rate Limiting

Rate Limiting Strategies

Algorithm How It Works Pros Cons
Fixed Window N requests per minute, resets on the minute Simple, predictable Burst at window boundary
Sliding Window Log Track timestamp of each request Accurate, smooth Memory-intensive
Sliding Window Counter Weighted average of current + previous window Balance of accuracy and efficiency Approximate
Token Bucket Bucket fills at constant rate, requests drain Allows bursts up to bucket size More complex
Leaky Bucket Requests queue, processed at constant rate Smooth output rate Adds latency, queue management

Token Bucket Implementation

// Redis-based token bucket
class TokenBucket {
  constructor(redis, key, capacity, refillRate) {
    this.redis = redis;
    this.key = key;
    this.capacity = capacity;        // Max tokens (burst size)
    this.refillRate = refillRate;    // Tokens per second
  }

  async allowRequest(tokens = 1) {
    const now = Date.now();

    // Lua script for atomic operation
    const script = `
      local key = KEYS[1]
      local capacity = tonumber(ARGV[1])
      local refillRate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])
      local requested = tonumber(ARGV[4])

      local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill')
      local tokens = tonumber(bucket[1]) or capacity
      local lastRefill = tonumber(bucket[2]) or now

      -- Refill tokens based on time elapsed
      local elapsed = (now - lastRefill) / 1000
      tokens = math.min(capacity, tokens + (elapsed * refillRate))

      if tokens >= requested then
        tokens = tokens - requested
        redis.call('HMSET', key, 'tokens', tokens, 'lastRefill', now)
        redis.call('EXPIRE', key, 3600)
        return {1, tokens}  -- Allowed, remaining tokens
      else
        return {0, tokens}  -- Denied, remaining tokens
      end
    `;

    const [allowed, remaining] = await this.redis.eval(
      script, 1, this.key,
      this.capacity, this.refillRate, now, tokens
    );

    return { allowed: allowed === 1, remaining };
  }
}

Rate Limit Tiers

Design Pattern: Different limits for different customer tiers and endpoints.
Tier Messages/sec API calls/min Burst
Free Trial 1 60 10
Standard 10 600 100
Enterprise 100 6,000 1,000
Enterprise+ Custom Custom Custom

Rate Limit Headers

// Standard rate limit headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 100         // Max requests in window
X-RateLimit-Remaining: 87      // Requests left in window
X-RateLimit-Reset: 1699900060  // Unix timestamp when window resets

// Draft IETF standard (RateLimit header)
RateLimit-Limit: 100
RateLimit-Remaining: 87
RateLimit-Reset: 60            // Seconds until reset

// Twilio adds endpoint-specific limits
X-Twilio-Concurrent-Requests: 25
X-Twilio-Concurrent-Requests-Limit: 100

2-Minute Answer: "How do you design rate limiting?"

"I use token bucket for rate limiting because it allows controlled bursts while enforcing average rate. Each customer has a bucket that refills at their tier's rate—say 10 tokens per second for standard accounts. They can burst up to the bucket capacity, but sustained traffic is capped. Implementation is a Redis Lua script for atomicity across distributed systems. I set different limits per endpoint based on cost—sending an SMS is more expensive than reading a message, so lower limits. Response headers tell customers their limits and remaining quota so they can self-throttle. When rate limited, I return 429 with a Retry-After header. Critical design decision: do you rate limit by API key, by account, or by IP? I do account-level for authenticated requests and IP-level for unauthenticated. For cell architecture, rate limiting happens at the Cell Router before traffic even reaches cells, protecting downstream systems."

API Governance at Scale

The Challenge

Problem: With 50+ teams building APIs, how do you maintain consistency without becoming a bottleneck?

Governance Model

┌─────────────────────────────────────────────────────────────────────────────┐ │ API GOVERNANCE FRAMEWORK │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ API Design │ │ Automated │ │ API Review │ │ │ │ Standards │ │ Linting │ │ Board │ │ │ │ (Document) │ │ (CI/CD) │ │ (People) │ │ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ • Naming conventions • OpenAPI spec lint • New API review │ │ • URL structure • Breaking change • Breaking change │ │ • Error format detection approval │ │ • Versioning rules • Security scan • Cross-team impact │ │ • Auth patterns • Style enforcement • Deprecation decisions │ │ │ └─────────────────────────────────────────────────────────────────────────────┘

API Design Standards Document

Area Standard Enforcement
URL format Lowercase, hyphens, plural nouns Linter
JSON casing camelCase Linter
Date format ISO 8601 (2024-01-15T10:30:00Z) Linter
Pagination Cursor-based with pageToken Review board
Error format Standard error object with code, message, details Linter
Versioning Date-based in URL path Review board
Authentication Account SID + Auth Token or API Key Review board

OpenAPI Linting in CI/CD

# .github/workflows/api-lint.yml
name: API Lint

on:
  pull_request:
    paths:
      - 'api/**/*.yaml'

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Lint OpenAPI Spec
        uses: stoplightio/spectral-action@v0.8
        with:
          file_glob: 'api/**/*.yaml'
          spectral_ruleset: '.spectral.yaml'

      - name: Check Breaking Changes
        run: |
          # Compare with main branch
          oasdiff breaking api/spec.yaml main:api/spec.yaml

      - name: Security Scan
        run: |
          # Check for sensitive data in examples
          # Check auth requirements on all endpoints
          api-security-scanner api/spec.yaml

Spectral Ruleset Example

# .spectral.yaml - API linting rules
extends: spectral:oas

rules:
  # Naming conventions
  paths-kebab-case:
    description: Paths must use kebab-case
    given: $.paths[*]~
    then:
      function: pattern
      functionOptions:
        match: "^(/[a-z0-9-]+)+$"

  properties-camel-case:
    description: Properties must use camelCase
    given: $..properties[*]~
    then:
      function: casing
      functionOptions:
        type: camel

  # Security
  operation-security-defined:
    description: All operations must have security defined
    given: $.paths[*][*]
    then:
      field: security
      function: truthy

  # Documentation
  operation-description:
    description: Operations must have descriptions
    given: $.paths[*][*]
    then:
      field: description
      function: truthy

  # Error responses
  error-response-schema:
    description: 4xx/5xx must use standard error schema
    given: $.paths[*][*].responses[?(@property >= 400)]
    then:
      field: content.application/json.schema.$ref
      function: pattern
      functionOptions:
        match: "#/components/schemas/Error"

API Review Board

Purpose: Humans review what automation can't catch—business logic, cross-team impact, strategic alignment.
Review Trigger Reviewers SLA
New public API API Board + Security + Docs 1 week
Breaking change API Board + affected teams 1 week
New version API Board 3 days
Deprecation API Board + Customer Success 1 week
Additive change Auto-approved if linting passes Immediate

2-Minute Answer: "How do you govern APIs across many teams?"

"Three layers: standards, automation, and humans. First, I publish an API design standards document covering naming, error formats, pagination, versioning—everything teams need to build consistent APIs. Second, I automate enforcement through OpenAPI linting in CI/CD. Every PR that touches API specs runs Spectral rules checking for kebab-case URLs, camelCase properties, security definitions, and more. Breaking changes are automatically detected and blocked. Third, an API Review Board—architects from across the organization—reviews new APIs, breaking changes, and deprecations. They catch what automation can't: business logic issues, cross-team conflicts, strategic misalignment. The key is making the right thing easy. I provide templates, generators, and examples. If teams start from our standard OpenAPI template, they're 80% compliant automatically. The review board becomes a partnership, not a gate."

Twilio API Patterns

Twilio API Design Philosophy

TwiML: Domain-Specific Language

TwiML is Twilio Markup Language—an XML-based DSL that instructs Twilio how to handle calls and messages.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Say voice="alice">Hello! Thanks for calling.</Say>
    <Gather input="speech dtmf" timeout="3" numDigits="1">
        <Say>Press 1 for sales, or 2 for support.</Say>
    </Gather>
    <Say>We didn't receive any input. Goodbye!</Say>
</Response>

Webhook Pattern

┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Customer │ │ Twilio │ │ Customer │ │ App │ │ Platform │ │ Webhook │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ 1. POST /Messages │ │ │ { to, from, body, │ │ │ statusCallback: URL } │ │ │──────────────────────────────────>│ │ │ │ │ │ 2. 201 Created │ │ │ { sid, status: "queued" } │ │ │<──────────────────────────────────│ │ │ │ │ │ │ 3. (async) Send to carrier │ │ │ │ │ │ 4. POST to statusCallback │ │ │ { sid, status: "sent" } │ │ │──────────────────────────────────>│ │ │ │ │ │ 5. Customer processes webhook │ │ │ │ │ │ 6. 200 OK (acknowledge) │ │ │<──────────────────────────────────│ │ │ │ │ │ 7. Later: "delivered" callback │ │ │──────────────────────────────────>│

Idempotency in Twilio APIs

// Idempotency key prevents duplicate sends
POST /2010-04-01/Accounts/{AccountSid}/Messages.json
Idempotency-Key: unique-request-id-12345
Content-Type: application/x-www-form-urlencoded

To=+15551234567&From=+15559876543&Body=Hello!

// If network fails and client retries with same key:
// - First request: Message created, returns 201
// - Retry: Same response returned, no duplicate message sent

// Twilio stores idempotency keys for 24 hours

API Versioning at Landing Zone

┌─────────────────────────────────────────────────────────────────────────────┐ │ LANDING ZONE (API GATEWAY) │ │ │ │ Incoming: POST /2010-04-01/Accounts/AC123/Messages.json │ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Version Router │ │ │ │ │ │ │ │ /2010-04-01/* → MessageHandlerV1 │ │ │ │ /2020-03-15/* → MessageHandlerV2 │ │ │ │ /2024-01-01/* → MessageHandlerV3 │ │ │ │ │ │ │ │ Unknown version → 400 Bad Request │ │ │ │ "API version not supported. See docs for available versions" │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ Version handler transforms request → routes to Cell → transforms response │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ Key Insight: Version routing is a GLOBAL concern (Landing Zone) Not per-cell. All cells support all versions.

SDK Versioning Strategy

Component Versioning Release Cadence
API (Backend) Date-based (2010-04-01) Breaking changes: ~yearly
SDK (Client libraries) SemVer (8.1.0) Minor: monthly, Major: yearly
SDK ↔ API mapping SDK pins to API version SDK 8.x → API 2020-03-15
// SDK hides API versioning from developers
const twilio = require('twilio')(accountSid, authToken);

// SDK internally uses /2020-03-15/Accounts/...
const message = await twilio.messages.create({
  to: '+15551234567',
  from: '+15559876543',
  body: 'Hello from Twilio!'
});

// To use a different API version:
const twilio = require('twilio')(accountSid, authToken, {
  apiVersion: '2024-01-01'  // Override default
});

2-Minute Answer: "How would you design Twilio's API versioning?"

"Twilio uses date-based versioning in the URL path—/2010-04-01/Messages. The date indicates when that API contract was locked. I'd keep this approach because it's explicit and self-documenting. Version routing happens at the Landing Zone, not in cells. When a request comes in, the API Gateway extracts the version and routes to the appropriate handler. All cells run all version handlers simultaneously, so there's no version-specific deployment. For SDKs, I use semver and pin each major SDK version to a specific API version. Developers use the SDK without thinking about API versions—it's abstracted away. When we release a new API version, we release a new SDK major version. This gives customers two migration paths: upgrade the SDK and get the new API automatically, or pin the SDK and never change. For breaking changes between API versions, I use the expand-contract pattern with 12+ months of overlap. The old version keeps working; we just stop adding features to it."

Interview Q&A

Design Questions

Q: "How would you evolve an API that millions of customers depend on?"

A: "First principle: never break existing integrations. I'd establish that additive changes are always safe—new optional fields, new endpoints, new enum values. For changes that could break clients, I use the expand-contract pattern: add new alongside old, support both for 6-12 months with deprecation warnings, then remove old. I'd track usage metrics obsessively—which accounts use which fields, which versions, which endpoints. Before any sunset, I'd know exactly who's affected and reach out directly. I'd build automated breaking change detection into CI/CD so engineers can't accidentally ship breaking changes. And I'd invest heavily in SDK quality—most customers use SDKs, not raw API calls, so the SDK becomes the abstraction layer that insulates them from API evolution."

Q: "A team wants to redesign their API with breaking changes. How do you handle this?"

A: "First, I'd validate the need. Is this a 'want' or a 'need'? What's the cost of not doing it? If it's truly necessary, I'd require a migration plan before approving: What's the timeline? How will existing customers be notified? What's the SDK impact? Who does direct customer outreach? Then I'd ensure it goes through the API Review Board. The new API version gets launched alongside the old one. We set a sunset date at least 12 months out. We add deprecation headers immediately. We track weekly migration metrics. We reach out to top accounts personally. Only when usage of the old version drops below a threshold do we sunset. I'd also push back on 'redesign' and ask if incremental evolution could achieve the same goals. Often a series of additive changes gets you 80% of the benefit without the migration pain."

Q: "How do you handle API rate limiting in a distributed system?"

A: "Rate limiting happens at the edge—in our cell architecture, that's the Cell Router in the Landing Zone. I use token bucket for its burst tolerance. Each account has a bucket that refills based on their tier. Implementation is Redis with Lua scripts for atomicity—you need the check-and-decrement to be atomic or you get race conditions. I set per-endpoint limits too, since some operations are more expensive than others. Response headers tell clients their limit, remaining quota, and reset time so they can implement client-side throttling. When rate limited, I return 429 with Retry-After. For distributed rate limiting across multiple Cell Routers, I use a shared Redis cluster. The alternative is local rate limiting with some slack—each router enforces a fraction of the limit—but that's less accurate. For enterprise customers, I support custom limits defined in their account config and retrieved during auth."

Q: "A customer reports your API returned different responses for the same request. How do you debug?"

A: "First, I'd verify it's actually the same request—same endpoint, headers, body, account, time window. Caching can cause this legitimately. If it's truly identical requests with different responses, I'd look at: Which cell handled each request? Could there be data inconsistency between cells? Did a deployment happen between requests? Is there a race condition in the handler? Is there randomization that shouldn't exist—like load balancing across inconsistent replicas? I'd pull the request IDs from their logs and trace them through our system. For cell-based architecture, I'd check if the requests hit the same cell or different cells, and whether those cells have identical data state. I'd also check if they're using a deprecated API version that has known inconsistencies we're in the process of fixing."

Q: "How do you ensure API consistency across 50+ teams?"

A: "Three-pronged approach. First, a comprehensive API design guide that covers naming, error formats, pagination, versioning—everything a team needs to build a consistent API. I make this the first thing new engineers read. Second, automated enforcement via OpenAPI linting in CI/CD. Every spec change runs through Spectral rules. Breaking changes are automatically detected. PRs can't merge if they violate standards. Third, an API Review Board of senior engineers who review new APIs and significant changes. They catch what automation misses—business logic issues, cross-team conflicts, strategic misalignment. The key is making compliance easy. I provide templates, generators, and golden examples. If you start from our template, you're 80% compliant automatically. The review board should feel like a partnership, not a gate."

Quick Reference: Key Numbers

Metric Guideline Rationale
Deprecation period 12 months minimum Customers need time to migrate
API response time (p99) < 200ms Developer experience threshold
Rate limit response 429 + Retry-After header Standard, actionable for clients
Page size default 20-50 items Balance response size and round trips
Page size max 100-200 items Prevent memory/timeout issues
Idempotency key retention 24 hours Handle retries from failures
Breaking change notice 6 months minimum Before removal of old behavior

Key Takeaways

For Twilio DA Interview

  • REST: Resources (nouns) + HTTP verbs
  • Date-based versioning at Landing Zone
  • Additive-only evolution when possible
  • Expand-contract for breaking changes
  • 12+ month deprecation cycles
  • Automated linting + Review Board

Trade-offs to Articulate

  • URL vs header versioning (visibility vs purity)
  • Strict vs lenient validation (safety vs flexibility)
  • Consistency vs team autonomy
  • SDK abstraction vs API transparency
  • Fast deprecation vs customer disruption
  • Global vs local rate limiting
Your Narrative: "APIs are contracts. At Twilio's scale, breaking an API breaks customer production code. I design for evolution from day one—additive changes, version isolation, long deprecation cycles. Governance scales through automation (linting, breaking change detection) plus humans (API Review Board) for what automation can't catch. The goal is that customers can depend on our APIs for years without surprise breakage."