🏗️ Terraform Infrastructure Code

Enterprise Cell - Complete Production-Ready Code

← Back to Study Guide

About This Code

Complete, untruncated Terraform infrastructure-as-code for deploying an enterprise cell:

  • Overlapping IP Support: All cells use 10.0.0.0/16 - VPC Lattice enables this
  • VPC: Dedicated VPC per cell with public/private subnets across 3 AZs
  • EKS: Managed Kubernetes cluster with 100-200 nodes (m6i.4xlarge)
  • VPC Lattice: Service mesh for overlapping IP routing and service discovery
  • AWS Load Balancer Controller: Helm deployment with IRSA
  • Production-Ready: Multi-AZ, auto-scaling, security groups, VPC endpoints
💡 Key Feature: Multiple cells can be deployed with identical 10.0.0.0/16 CIDRs. VPC Lattice routes by service name (e.g., enterprise-us-east-1-a-api) instead of IP addresses.

Architecture Diagrams

Diagram 1: Regional Overview - Multiple Cells with Overlapping IPs

This diagram shows how VPC Lattice enables overlapping IP spaces across cells via service-based routing.

AWS Region: us-east-1 VPC Lattice Service Network Service-Based Routing (NOT IP-Based) twilio-us-east-1-service-network enterprise-us-east-1-a-api enterprise-us-east-1-b-api enterprise-us-east-1-c-api Cell A (VPC) 10.0.0.0/16 ALB enterprise-a-alb EKS Cluster 100 nodes (m6i.4xlarge) Pod Pod Pod Pod Pod Pod 3 AZs × ~33 nodes Cell B (VPC) 10.0.0.0/16 ALB enterprise-b-alb EKS Cluster 100 nodes (m6i.4xlarge) Pod Pod Pod 3 AZs × ~33 nodes Cell C (VPC) 10.0.0.0/16 ALB enterprise-c-alb EKS Cluster 100 nodes (m6i.4xlarge) Pod Pod Pod 3 AZs × ~33 nodes ALL CELLS USE SAME 10.0.0.0/16 CIDR VPC Lattice routes by service name, NOT IP!

Diagram 2: Detailed Single Cell Architecture

Complete infrastructure for one enterprise cell showing VPC, subnets, EKS, and networking components.

VPC Lattice Service enterprise-us-east-1-a-api VPC: enterprise-us-east-1-a CIDR: 10.0.0.0/16 Internet Gateway Public Subnet - AZ 1a 10.0.0.0/20 Public Subnet - AZ 1b 10.0.16.0/20 Public Subnet - AZ 1c 10.0.32.0/20 NAT Gateway NAT Gateway NAT Gateway Application Load Balancer Target Type: IP (direct pod targeting) Private Subnet - AZ 1a 10.0.64.0/18 Private Subnet - AZ 1b 10.0.128.0/18 Private Subnet - AZ 1c 10.0.192.0/18 EKS Nodes ~33 nodes EKS Nodes ~33 nodes EKS Nodes ~33 nodes Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod Pod EKS Control Plane Managed Kubernetes VPC Endpoints S3, ECR, DynamoDB 🛡️ 🛡️ 🛡️ 100 nodes (m6i.4xlarge) | Auto-scaling: 50-200 nodes

Diagram 3: Traffic Flow - Request to Pod

Complete request flow from Cell Router through VPC Lattice to application pods.

Cell Router (Lambda Function) POST /v1/Messages customer: "acme-corp" DynamoDB Lookup customer_id → cell_id 1 Step 2 VPC Lattice Service Network Service-Based Routing (NOT IP-based) Resolves: enterprise-us-east-1-a-api → ALB in Cell A Step 3 HTTPS:443 Application Load Balancer TLS Termination Target Group (IP mode) Health checks: /health every 30s Step 4 Direct IP EKS Pod Application Container Process Request Send SMS via Twilio API Response 200 OK Success Total Latency Breakdown Step 1-2 (Router + DynamoDB): ~5ms | Step 3 (VPC Lattice): ~2ms Step 4 (ALB → Pod): ~3ms | Application Processing: ~50ms

Resource Summary: What Terraform Creates

main.tf (451 lines) ├─ VPC (10.0.0.0/16 - overlapping across cells) ├─ Subnets │ ├─ Public: 3 subnets (/20) for ALB across 3 AZs │ └─ Private: 3 subnets (/18) for EKS across 3 AZs ├─ Internet Gateway ├─ NAT Gateways (3 - one per AZ for HA) ├─ Route Tables (1 public + 3 private) ├─ VPC Endpoints (S3, ECR API, ECR Docker) └─ Security Groups (ALB, EKS Pods, EKS Nodes, VPC Endpoints) eks.tf (381 lines) ├─ IAM Roles (EKS Cluster, EKS Nodes, VPC CNI with IRSA) ├─ EKS Cluster (Kubernetes 1.28) ├─ Managed Node Group (100 nodes, m6i.4xlarge, auto-scaling 50-200) ├─ Security Groups (Cluster control plane, worker nodes) ├─ EKS Addons (VPC CNI, CoreDNS, kube-proxy) └─ OIDC Provider (for IRSA) vpc-lattice.tf (205 lines) ├─ VPC Lattice Service Network Association ├─ VPC Lattice Service (enterprise-us-east-1-a-api) ├─ VPC Lattice Target Group (type: ALB) ├─ VPC Lattice Listener (HTTPS:443) ├─ VPC Lattice Auth Policy (AWS IAM) └─ Security Group (VPC Lattice traffic: 169.254.171.0/24) aws-load-balancer-controller.tf (372 lines) ├─ IAM Policy (comprehensive ELB permissions) ├─ IAM Role (IRSA for service account) └─ Helm Release (aws-load-balancer-controller) ├─ Creates ALBs for Kubernetes Ingress resources └─ Target Type: IP (direct pod targeting) Total Resources: ~60 AWS resources per cell
📄 terraform/enterprise-cell/main.tf 451 lines
# Twilio Cell-Based Architecture - Enterprise Cell Infrastructure
# This module creates a fully isolated enterprise cell with overlapping IP space (10.0.0.0/16)
# VPC Lattice enables overlapping IPs across cells via service-based routing

terraform {
  required_version = ">= 1.5"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.23"
    }
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.11"
    }
  }
}

# ============================================================================
# VARIABLES
# ============================================================================

variable "cell_id" {
  description = "Unique identifier for this cell (e.g., enterprise-us-east-1-a)"
  type        = string
}

variable "region" {
  description = "AWS region for this cell"
  type        = string
}

variable "vpc_cidr" {
  description = "CIDR block for the cell VPC - ALL enterprise cells use 10.0.0.0/16 (overlapping)"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones for multi-AZ deployment"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "eks_cluster_version" {
  description = "Kubernetes version for EKS cluster"
  type        = string
  default     = "1.28"
}

variable "eks_node_instance_types" {
  description = "Instance types for EKS managed node groups"
  type        = list(string)
  default     = ["m6i.4xlarge"]
}

variable "eks_node_desired_size" {
  description = "Desired number of EKS nodes"
  type        = number
  default     = 100
}

variable "eks_node_min_size" {
  description = "Minimum number of EKS nodes"
  type        = number
  default     = 50
}

variable "eks_node_max_size" {
  description = "Maximum number of EKS nodes"
  type        = number
  default     = 200
}

variable "vpc_lattice_service_network_id" {
  description = "ID of the VPC Lattice service network (shared across all cells in region)"
  type        = string
}

variable "customer_ids" {
  description = "List of customer IDs assigned to this cell (max 100 for enterprise cells)"
  type        = list(string)
}

variable "aws_organization_id" {
  description = "AWS Organization ID for IAM policies"
  type        = string
}

variable "tags" {
  description = "Common tags to apply to all resources"
  type        = map(string)
  default     = {}
}

# ============================================================================
# VPC - OVERLAPPING IP SPACE (10.0.0.0/16)
# All enterprise cells use the same CIDR - VPC Lattice enables this
# ============================================================================

resource "aws_vpc" "cell" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(
    var.tags,
    {
      Name                                        = "${var.cell_id}-vpc"
      "kubernetes.io/cluster/${var.cell_id}-eks" = "shared"
      CellID                                      = var.cell_id
      Type                                        = "enterprise-cell"
    }
  )
}

# Public subnets for ALB (cell edge load balancer)
resource "aws_subnet" "public" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.cell.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 4, count.index)  # 10.0.0.0/20, 10.0.16.0/20, 10.0.32.0/20
  availability_zone = var.availability_zones[count.index]

  map_public_ip_on_launch = true

  tags = merge(
    var.tags,
    {
      Name                                        = "${var.cell_id}-public-${var.availability_zones[count.index]}"
      "kubernetes.io/role/elb"                   = "1"
      "kubernetes.io/cluster/${var.cell_id}-eks" = "shared"
      CellID                                      = var.cell_id
      Type                                        = "public"
    }
  )
}

# Private subnets for EKS nodes
resource "aws_subnet" "private" {
  count             = length(var.availability_zones)
  vpc_id            = aws_vpc.cell.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 2, count.index + 1)  # 10.0.64.0/18, 10.0.128.0/18, 10.0.192.0/18
  availability_zone = var.availability_zones[count.index]

  tags = merge(
    var.tags,
    {
      Name                                        = "${var.cell_id}-private-${var.availability_zones[count.index]}"
      "kubernetes.io/role/internal-elb"          = "1"
      "kubernetes.io/cluster/${var.cell_id}-eks" = "shared"
      CellID                                      = var.cell_id
      Type                                        = "private"
    }
  )
}

# Internet Gateway for public subnets
resource "aws_internet_gateway" "cell" {
  vpc_id = aws_vpc.cell.id

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-igw"
      CellID = var.cell_id
    }
  )
}

# NAT Gateways for private subnet egress (one per AZ for HA)
resource "aws_eip" "nat" {
  count  = length(var.availability_zones)
  domain = "vpc"

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-nat-eip-${var.availability_zones[count.index]}"
      CellID = var.cell_id
    }
  )

  depends_on = [aws_internet_gateway.cell]
}

resource "aws_nat_gateway" "cell" {
  count         = length(var.availability_zones)
  allocation_id = aws_eip.nat[count.index].id
  subnet_id     = aws_subnet.public[count.index].id

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-nat-${var.availability_zones[count.index]}"
      CellID = var.cell_id
    }
  )

  depends_on = [aws_internet_gateway.cell]
}

# Route table for public subnets
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.cell.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.cell.id
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-public-rt"
      CellID = var.cell_id
    }
  )
}

resource "aws_route_table_association" "public" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# Route tables for private subnets (one per AZ, routes to respective NAT Gateway)
resource "aws_route_table" "private" {
  count  = length(var.availability_zones)
  vpc_id = aws_vpc.cell.id

  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.cell[count.index].id
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-private-rt-${var.availability_zones[count.index]}"
      CellID = var.cell_id
    }
  )
}

resource "aws_route_table_association" "private" {
  count          = length(var.availability_zones)
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private[count.index].id
}

# ============================================================================
# VPC ENDPOINTS - Private connectivity to AWS services
# ============================================================================

resource "aws_vpc_endpoint" "s3" {
  vpc_id       = aws_vpc.cell.id
  service_name = "com.amazonaws.${var.region}.s3"

  route_table_ids = concat(
    [aws_route_table.public.id],
    aws_route_table.private[*].id
  )

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-s3-endpoint"
      CellID = var.cell_id
    }
  )
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.cell.id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-ecr-api-endpoint"
      CellID = var.cell_id
    }
  )
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.cell.id
  service_name        = "com.amazonaws.${var.region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-ecr-dkr-endpoint"
      CellID = var.cell_id
    }
  )
}

# ============================================================================
# SECURITY GROUPS
# ============================================================================

# Security group for VPC endpoints
resource "aws_security_group" "vpc_endpoints" {
  name_description = "${var.cell_id}-vpc-endpoints"
  description      = "Security group for VPC endpoints"
  vpc_id           = aws_vpc.cell.id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [var.vpc_cidr]
    description = "Allow HTTPS from VPC"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-vpc-endpoints-sg"
      CellID = var.cell_id
    }
  )
}

# Security group for ALB (cell edge)
resource "aws_security_group" "alb" {
  name_prefix = "${var.cell_id}-alb-"
  description = "Security group for cell edge ALB"
  vpc_id      = aws_vpc.cell.id

  # Allow HTTPS from VPC Lattice service network
  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["169.254.171.0/24"]  # VPC Lattice managed CIDR
    description = "Allow HTTPS from VPC Lattice"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-alb-sg"
      CellID = var.cell_id
    }
  )
}

# Security group for EKS pods
resource "aws_security_group" "eks_pods" {
  name_prefix = "${var.cell_id}-eks-pods-"
  description = "Security group for EKS pods"
  vpc_id      = aws_vpc.cell.id

  # Allow traffic from ALB
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
    description     = "Allow traffic from ALB"
  }

  # Allow pod-to-pod communication within VPC
  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    self        = true
    description = "Allow pod-to-pod communication"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(
    var.tags,
    {
      Name                                        = "${var.cell_id}-eks-pods-sg"
      "kubernetes.io/cluster/${var.cell_id}-eks" = "owned"
      CellID                                      = var.cell_id
    }
  )
}

# Output the VPC and subnet information for use by other modules
output "vpc_id" {
  description = "ID of the cell VPC"
  value       = aws_vpc.cell.id
}

output "vpc_cidr" {
  description = "CIDR block of the cell VPC"
  value       = aws_vpc.cell.cidr_block
}

output "public_subnet_ids" {
  description = "IDs of public subnets (for ALB)"
  value       = aws_subnet.public[*].id
}

output "private_subnet_ids" {
  description = "IDs of private subnets (for EKS nodes)"
  value       = aws_subnet.private[*].id
}

output "alb_security_group_id" {
  description = "Security group ID for ALB"
  value       = aws_security_group.alb.id
}

output "eks_pod_security_group_id" {
  description = "Security group ID for EKS pods"
  value       = aws_security_group.eks_pods.id
}
📄 terraform/enterprise-cell/eks.tf 381 lines
# EKS Cluster for Enterprise Cell
# Each cell has a dedicated EKS cluster with ~1000 nodes serving ~100 enterprise customers

# ============================================================================
# IAM ROLES FOR EKS
# ============================================================================

# IAM role for EKS cluster
resource "aws_iam_role" "eks_cluster" {
  name_prefix = "${var.cell_id}-eks-cluster-"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "eks.amazonaws.com"
      }
    }]
  })

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-eks-cluster-role"
      CellID = var.cell_id
    }
  )
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

resource "aws_iam_role_policy_attachment" "eks_vpc_resource_controller" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
  role       = aws_iam_role.eks_cluster.name
}

# IAM role for EKS node groups
resource "aws_iam_role" "eks_nodes" {
  name_prefix = "${var.cell_id}-eks-nodes-"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "ec2.amazonaws.com"
      }
    }]
  })

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-eks-nodes-role"
      CellID = var.cell_id
    }
  )
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_nodes.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_nodes.name
}

resource "aws_iam_role_policy_attachment" "eks_container_registry_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_nodes.name
}

# ============================================================================
# EKS CLUSTER
# ============================================================================

resource "aws_eks_cluster" "cell" {
  name     = "${var.cell_id}-eks"
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.eks_cluster_version

  vpc_config {
    subnet_ids              = aws_subnet.private[*].id
    endpoint_private_access = true
    endpoint_public_access  = true  # Set to false in production, use VPN/bastion
    security_group_ids      = [aws_security_group.eks_cluster.id]
  }

  # Enable control plane logging
  enabled_cluster_log_types = [
    "api",
    "audit",
    "authenticator",
    "controllerManager",
    "scheduler"
  ]

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-eks"
      CellID = var.cell_id
    }
  )

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
    aws_iam_role_policy_attachment.eks_vpc_resource_controller,
  ]
}

# Security group for EKS cluster control plane
resource "aws_security_group" "eks_cluster" {
  name_prefix = "${var.cell_id}-eks-cluster-"
  description = "Security group for EKS cluster control plane"
  vpc_id      = aws_vpc.cell.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-eks-cluster-sg"
      CellID = var.cell_id
    }
  )
}

# Allow worker nodes to communicate with cluster control plane
resource "aws_security_group_rule" "cluster_ingress_nodes" {
  description              = "Allow worker nodes to communicate with cluster API Server"
  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = aws_security_group.eks_cluster.id
  source_security_group_id = aws_security_group.eks_nodes.id
  to_port                  = 443
  type                     = "ingress"
}

# Security group for EKS worker nodes
resource "aws_security_group" "eks_nodes" {
  name_prefix = "${var.cell_id}-eks-nodes-"
  description = "Security group for EKS worker nodes"
  vpc_id      = aws_vpc.cell.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(
    var.tags,
    {
      Name                                        = "${var.cell_id}-eks-nodes-sg"
      "kubernetes.io/cluster/${var.cell_id}-eks" = "owned"
      CellID                                      = var.cell_id
    }
  )
}

# Allow nodes to communicate with each other
resource "aws_security_group_rule" "nodes_internal" {
  description              = "Allow nodes to communicate with each other"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = aws_security_group.eks_nodes.id
  source_security_group_id = aws_security_group.eks_nodes.id
  to_port                  = 65535
  type                     = "ingress"
}

# Allow worker nodes to receive traffic from cluster control plane
resource "aws_security_group_rule" "nodes_cluster_inbound" {
  description              = "Allow worker nodes to receive traffic from cluster control plane"
  from_port                = 0
  protocol                 = "-1"
  security_group_id        = aws_security_group.eks_nodes.id
  source_security_group_id = aws_security_group.eks_cluster.id
  to_port                  = 65535
  type                     = "ingress"
}

# ============================================================================
# EKS MANAGED NODE GROUPS
# ============================================================================

resource "aws_eks_node_group" "cell" {
  cluster_name    = aws_eks_cluster.cell.name
  node_group_name = "${var.cell_id}-node-group"
  node_role_arn   = aws_iam_role.eks_nodes.arn
  subnet_ids      = aws_subnet.private[*].id

  scaling_config {
    desired_size = var.eks_node_desired_size
    max_size     = var.eks_node_max_size
    min_size     = var.eks_node_min_size
  }

  update_config {
    max_unavailable = 10  # Allow up to 10 nodes to be unavailable during updates
  }

  instance_types = var.eks_node_instance_types
  capacity_type  = "ON_DEMAND"  # Use ON_DEMAND for enterprise cells, SPOT for SMB

  labels = {
    CellID = var.cell_id
    Tier   = "enterprise"
  }

  tags = merge(
    var.tags,
    {
      Name                                        = "${var.cell_id}-node-group"
      "kubernetes.io/cluster/${var.cell_id}-eks" = "owned"
      CellID                                      = var.cell_id
    }
  )

  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry_policy,
  ]
}

# ============================================================================
# EKS ADDONS
# ============================================================================

# VPC CNI addon (for pod networking)
resource "aws_eks_addon" "vpc_cni" {
  cluster_name             = aws_eks_cluster.cell.name
  addon_name               = "vpc-cni"
  addon_version            = "v1.15.1-eksbuild.1"  # Use latest compatible version
  resolve_conflicts        = "OVERWRITE"
  service_account_role_arn = aws_iam_role.vpc_cni.arn

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-vpc-cni-addon"
      CellID = var.cell_id
    }
  )
}

# CoreDNS addon
resource "aws_eks_addon" "coredns" {
  cluster_name      = aws_eks_cluster.cell.name
  addon_name        = "coredns"
  addon_version     = "v1.10.1-eksbuild.6"
  resolve_conflicts = "OVERWRITE"

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-coredns-addon"
      CellID = var.cell_id
    }
  )

  depends_on = [aws_eks_node_group.cell]
}

# kube-proxy addon
resource "aws_eks_addon" "kube_proxy" {
  cluster_name      = aws_eks_cluster.cell.name
  addon_name        = "kube-proxy"
  addon_version     = "v1.28.2-eksbuild.2"
  resolve_conflicts = "OVERWRITE"

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-kube-proxy-addon"
      CellID = var.cell_id
    }
  )
}

# IAM role for VPC CNI (to enable custom networking and security groups for pods)
resource "aws_iam_role" "vpc_cni" {
  name_prefix = "${var.cell_id}-vpc-cni-"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRoleWithWebIdentity"
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.eks.arn
      }
      Condition = {
        StringEquals = {
          "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-node"
          "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:aud" = "sts.amazonaws.com"
        }
      }
    }]
  })

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-vpc-cni-role"
      CellID = var.cell_id
    }
  )
}

resource "aws_iam_role_policy_attachment" "vpc_cni" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.vpc_cni.name
}

# OIDC provider for EKS (required for IRSA - IAM Roles for Service Accounts)
data "tls_certificate" "eks" {
  url = aws_eks_cluster.cell.identity[0].oidc[0].issuer
}

resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = aws_eks_cluster.cell.identity[0].oidc[0].issuer

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-eks-oidc"
      CellID = var.cell_id
    }
  )
}

# ============================================================================
# OUTPUTS
# ============================================================================

output "eks_cluster_id" {
  description = "ID of the EKS cluster"
  value       = aws_eks_cluster.cell.id
}

output "eks_cluster_endpoint" {
  description = "Endpoint for EKS cluster"
  value       = aws_eks_cluster.cell.endpoint
}

output "eks_cluster_security_group_id" {
  description = "Security group ID for EKS cluster"
  value       = aws_eks_cluster.cell.vpc_config[0].cluster_security_group_id
}

output "eks_cluster_oidc_issuer_url" {
  description = "OIDC issuer URL for the EKS cluster (for IRSA)"
  value       = aws_eks_cluster.cell.identity[0].oidc[0].issuer
}

output "eks_node_group_id" {
  description = "ID of the EKS node group"
  value       = aws_eks_node_group.cell.id
}
📄 terraform/enterprise-cell/vpc-lattice.tf 205 lines
# VPC Lattice Integration
# Registers this cell with the regional VPC Lattice service network
# VPC Lattice enables overlapping IP spaces via service-based routing

# ============================================================================
# VPC LATTICE SERVICE NETWORK ASSOCIATION
# ============================================================================

# Associate this cell's VPC with the shared VPC Lattice service network
resource "aws_vpclattice_service_network_vpc_association" "cell" {
  vpc_identifier             = aws_vpc.cell.id
  service_network_identifier = var.vpc_lattice_service_network_id

  security_group_ids = [aws_security_group.vpc_lattice.id]

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-lattice-association"
      CellID = var.cell_id
    }
  )
}

# Security group for VPC Lattice traffic
resource "aws_security_group" "vpc_lattice" {
  name_prefix = "${var.cell_id}-vpc-lattice-"
  description = "Security group for VPC Lattice traffic"
  vpc_id      = aws_vpc.cell.id

  # Allow all inbound from VPC Lattice managed CIDR
  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["169.254.171.0/24"]  # VPC Lattice managed prefix list
    description = "Allow all traffic from VPC Lattice"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
    description = "Allow all outbound"
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-vpc-lattice-sg"
      CellID = var.cell_id
    }
  )
}

# ============================================================================
# VPC LATTICE SERVICE (for this cell)
# ============================================================================

# Create a VPC Lattice service for this cell
# The ALB created by AWS Load Balancer Controller will be registered as target
resource "aws_vpclattice_service" "cell" {
  name               = "${var.cell_id}-api"
  auth_type          = "AWS_IAM"  # Require IAM authentication for service-to-service calls
  custom_domain_name = "${var.cell_id}.twilio-internal.com"  # Optional custom domain

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-lattice-service"
      CellID = var.cell_id
    }
  )
}

# Associate the service with the service network
resource "aws_vpclattice_service_network_service_association" "cell" {
  service_identifier         = aws_vpclattice_service.cell.id
  service_network_identifier = var.vpc_lattice_service_network_id

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-service-association"
      CellID = var.cell_id
    }
  )
}

# ============================================================================
# VPC LATTICE TARGET GROUP (ALB as target)
# ============================================================================

# Create target group that will point to the cell's ALB
# The ALB ARN will be added after the AWS Load Balancer Controller creates it
resource "aws_vpclattice_target_group" "alb" {
  name = "${var.cell_id}-alb-tg"
  type = "ALB"

  config {
    vpc_identifier = aws_vpc.cell.id
    port           = 443
    protocol       = "HTTPS"

    health_check {
      enabled                       = true
      health_check_interval_seconds = 30
      health_check_timeout_seconds  = 5
      healthy_threshold_count       = 2
      unhealthy_threshold_count     = 2
      path                          = "/health"
      protocol                      = "HTTPS"
      protocol_version              = "HTTP1"
      matcher {
        value = "200-299"
      }
    }
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-alb-target-group"
      CellID = var.cell_id
    }
  )
}

# VPC Lattice listener for the service
resource "aws_vpclattice_listener" "https" {
  name               = "https"
  protocol           = "HTTPS"
  port               = 443
  service_identifier = aws_vpclattice_service.cell.id

  default_action {
    forward {
      target_groups {
        target_group_identifier = aws_vpclattice_target_group.alb.id
        weight                  = 100
      }
    }
  }

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-https-listener"
      CellID = var.cell_id
    }
  )
}

# ============================================================================
# IAM POLICY FOR VPC LATTICE ACCESS
# ============================================================================

# IAM policy to allow access to this cell's VPC Lattice service
# This will be attached to the Cell Router Lambda role
resource "aws_vpclattice_auth_policy" "cell_service" {
  resource_identifier = aws_vpclattice_service.cell.arn

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = "*"
        Action    = "vpc-lattice-svcs:Invoke"
        Resource  = "*"
        Condition = {
          StringEquals = {
            "aws:PrincipalOrgID" = var.aws_organization_id  # Only allow access from same AWS org
          }
        }
      }
    ]
  })
}

# ============================================================================
# OUTPUTS
# ============================================================================

output "vpc_lattice_service_id" {
  description = "ID of the VPC Lattice service for this cell"
  value       = aws_vpclattice_service.cell.id
}

output "vpc_lattice_service_arn" {
  description = "ARN of the VPC Lattice service for this cell"
  value       = aws_vpclattice_service.cell.arn
}

output "vpc_lattice_service_dns" {
  description = "DNS name of the VPC Lattice service"
  value       = aws_vpclattice_service.cell.dns_entry[0].domain_name
}

output "vpc_lattice_target_group_id" {
  description = "ID of the VPC Lattice target group (for ALB registration)"
  value       = aws_vpclattice_target_group.alb.id
}
📄 terraform/enterprise-cell/aws-load-balancer-controller.tf 372 lines
# AWS Load Balancer Controller
# Automatically creates ALBs for Kubernetes Ingress resources
# ALBs are registered as VPC Lattice targets

# ============================================================================
# IAM ROLE FOR AWS LOAD BALANCER CONTROLLER (IRSA)
# ============================================================================

# IAM policy for AWS Load Balancer Controller
data "aws_iam_policy_document" "aws_load_balancer_controller" {
  statement {
    effect = "Allow"
    actions = [
      "iam:CreateServiceLinkedRole",
    ]
    resources = ["*"]
    condition {
      test     = "StringEquals"
      variable = "iam:AWSServiceName"
      values   = ["elasticloadbalancing.amazonaws.com"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "ec2:DescribeAccountAttributes",
      "ec2:DescribeAddresses",
      "ec2:DescribeAvailabilityZones",
      "ec2:DescribeInternetGateways",
      "ec2:DescribeVpcs",
      "ec2:DescribeVpcPeeringConnections",
      "ec2:DescribeSubnets",
      "ec2:DescribeSecurityGroups",
      "ec2:DescribeInstances",
      "ec2:DescribeNetworkInterfaces",
      "ec2:DescribeTags",
      "ec2:GetCoipPoolUsage",
      "ec2:DescribeCoipPools",
      "elasticloadbalancing:DescribeLoadBalancers",
      "elasticloadbalancing:DescribeLoadBalancerAttributes",
      "elasticloadbalancing:DescribeListeners",
      "elasticloadbalancing:DescribeListenerCertificates",
      "elasticloadbalancing:DescribeSSLPolicies",
      "elasticloadbalancing:DescribeRules",
      "elasticloadbalancing:DescribeTargetGroups",
      "elasticloadbalancing:DescribeTargetGroupAttributes",
      "elasticloadbalancing:DescribeTargetHealth",
      "elasticloadbalancing:DescribeTags",
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "cognito-idp:DescribeUserPoolClient",
      "acm:ListCertificates",
      "acm:DescribeCertificate",
      "iam:ListServerCertificates",
      "iam:GetServerCertificate",
      "waf-regional:GetWebACL",
      "waf-regional:GetWebACLForResource",
      "waf-regional:AssociateWebACL",
      "waf-regional:DisassociateWebACL",
      "wafv2:GetWebACL",
      "wafv2:GetWebACLForResource",
      "wafv2:AssociateWebACL",
      "wafv2:DisassociateWebACL",
      "shield:GetSubscriptionState",
      "shield:DescribeProtection",
      "shield:CreateProtection",
      "shield:DeleteProtection",
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "ec2:AuthorizeSecurityGroupIngress",
      "ec2:RevokeSecurityGroupIngress",
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "ec2:CreateSecurityGroup",
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "ec2:CreateTags",
    ]
    resources = ["arn:aws:ec2:*:*:security-group/*"]
    condition {
      test     = "StringEquals"
      variable = "ec2:CreateAction"
      values   = ["CreateSecurityGroup"]
    }
    condition {
      test     = "Null"
      variable = "aws:RequestTag/elbv2.k8s.aws/cluster"
      values   = ["false"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "ec2:CreateTags",
      "ec2:DeleteTags",
    ]
    resources = ["arn:aws:ec2:*:*:security-group/*"]
    condition {
      test     = "Null"
      variable = "aws:RequestTag/elbv2.k8s.aws/cluster"
      values   = ["true"]
    }
    condition {
      test     = "Null"
      variable = "aws:ResourceTag/elbv2.k8s.aws/cluster"
      values   = ["false"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "ec2:AuthorizeSecurityGroupIngress",
      "ec2:RevokeSecurityGroupIngress",
      "ec2:DeleteSecurityGroup",
    ]
    resources = ["*"]
    condition {
      test     = "Null"
      variable = "aws:ResourceTag/elbv2.k8s.aws/cluster"
      values   = ["false"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:CreateLoadBalancer",
      "elasticloadbalancing:CreateTargetGroup",
    ]
    resources = ["*"]
    condition {
      test     = "Null"
      variable = "aws:RequestTag/elbv2.k8s.aws/cluster"
      values   = ["false"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:CreateListener",
      "elasticloadbalancing:DeleteListener",
      "elasticloadbalancing:CreateRule",
      "elasticloadbalancing:DeleteRule",
    ]
    resources = ["*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:AddTags",
      "elasticloadbalancing:RemoveTags",
    ]
    resources = [
      "arn:aws:elasticloadbalancing:*:*:targetgroup/*/*",
      "arn:aws:elasticloadbalancing:*:*:loadbalancer/net/*/*",
      "arn:aws:elasticloadbalancing:*:*:loadbalancer/app/*/*",
    ]
    condition {
      test     = "Null"
      variable = "aws:RequestTag/elbv2.k8s.aws/cluster"
      values   = ["true"]
    }
    condition {
      test     = "Null"
      variable = "aws:ResourceTag/elbv2.k8s.aws/cluster"
      values   = ["false"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:AddTags",
      "elasticloadbalancing:RemoveTags",
    ]
    resources = [
      "arn:aws:elasticloadbalancing:*:*:listener/net/*/*/*",
      "arn:aws:elasticloadbalancing:*:*:listener/app/*/*/*",
      "arn:aws:elasticloadbalancing:*:*:listener-rule/net/*/*/*",
      "arn:aws:elasticloadbalancing:*:*:listener-rule/app/*/*/*",
    ]
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:ModifyLoadBalancerAttributes",
      "elasticloadbalancing:SetIpAddressType",
      "elasticloadbalancing:SetSecurityGroups",
      "elasticloadbalancing:SetSubnets",
      "elasticloadbalancing:DeleteLoadBalancer",
      "elasticloadbalancing:ModifyTargetGroup",
      "elasticloadbalancing:ModifyTargetGroupAttributes",
      "elasticloadbalancing:DeleteTargetGroup",
    ]
    resources = ["*"]
    condition {
      test     = "Null"
      variable = "aws:ResourceTag/elbv2.k8s.aws/cluster"
      values   = ["false"]
    }
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:RegisterTargets",
      "elasticloadbalancing:DeregisterTargets",
    ]
    resources = ["arn:aws:elasticloadbalancing:*:*:targetgroup/*/*"]
  }

  statement {
    effect = "Allow"
    actions = [
      "elasticloadbalancing:SetWebAcl",
      "elasticloadbalancing:ModifyListener",
      "elasticloadbalancing:AddListenerCertificates",
      "elasticloadbalancing:RemoveListenerCertificates",
      "elasticloadbalancing:ModifyRule",
    ]
    resources = ["*"]
  }
}

resource "aws_iam_policy" "aws_load_balancer_controller" {
  name_prefix = "${var.cell_id}-aws-lb-controller-"
  description = "IAM policy for AWS Load Balancer Controller"
  policy      = data.aws_iam_policy_document.aws_load_balancer_controller.json

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-aws-lb-controller-policy"
      CellID = var.cell_id
    }
  )
}

resource "aws_iam_role" "aws_load_balancer_controller" {
  name_prefix = "${var.cell_id}-aws-lb-controller-"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRoleWithWebIdentity"
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.eks.arn
      }
      Condition = {
        StringEquals = {
          "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:sub" = "system:serviceaccount:kube-system:aws-load-balancer-controller"
          "${replace(aws_iam_openid_connect_provider.eks.url, "https://", "")}:aud" = "sts.amazonaws.com"
        }
      }
    }]
  })

  tags = merge(
    var.tags,
    {
      Name   = "${var.cell_id}-aws-lb-controller-role"
      CellID = var.cell_id
    }
  )
}

resource "aws_iam_role_policy_attachment" "aws_load_balancer_controller" {
  policy_arn = aws_iam_policy.aws_load_balancer_controller.arn
  role       = aws_iam_role.aws_load_balancer_controller.name
}

# ============================================================================
# INSTALL AWS LOAD BALANCER CONTROLLER VIA HELM
# ============================================================================

provider "helm" {
  kubernetes {
    host                   = aws_eks_cluster.cell.endpoint
    cluster_ca_certificate = base64decode(aws_eks_cluster.cell.certificate_authority[0].data)
    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      args        = ["eks", "get-token", "--cluster-name", aws_eks_cluster.cell.id]
      command     = "aws"
    }
  }
}

resource "helm_release" "aws_load_balancer_controller" {
  name       = "aws-load-balancer-controller"
  repository = "https://aws.github.io/eks-charts"
  chart      = "aws-load-balancer-controller"
  namespace  = "kube-system"
  version    = "1.6.2"

  set {
    name  = "clusterName"
    value = aws_eks_cluster.cell.id
  }

  set {
    name  = "serviceAccount.create"
    value = "true"
  }

  set {
    name  = "serviceAccount.name"
    value = "aws-load-balancer-controller"
  }

  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = aws_iam_role.aws_load_balancer_controller.arn
  }

  set {
    name  = "region"
    value = var.region
  }

  set {
    name  = "vpcId"
    value = aws_vpc.cell.id
  }

  # Register ALBs with VPC Lattice target group
  set {
    name  = "defaultTags.CellID"
    value = var.cell_id
  }

  depends_on = [
    aws_eks_node_group.cell,
    aws_iam_role_policy_attachment.aws_load_balancer_controller
  ]
}

# ============================================================================
# OUTPUTS
# ============================================================================

output "aws_load_balancer_controller_role_arn" {
  description = "ARN of the IAM role for AWS Load Balancer Controller"
  value       = aws_iam_role.aws_load_balancer_controller.arn
}
📄 terraform/enterprise-cell/README.md 267 lines
# Twilio Cell-Based Architecture - Enterprise Cell Terraform Module

This Terraform module creates a fully isolated enterprise cell for Twilio's cell-based architecture on AWS.

## Architecture Overview

Each enterprise cell is a **dedicated VPC with overlapping IP space** (`10.0.0.0/16`). VPC Lattice enables this by routing based on **service names** instead of IP addresses, allowing all enterprise cells to use identical CIDRs.

### Key Components

1. **VPC** - Isolated VPC with overlapping CIDR (10.0.0.0/16)
2. **Subnets** - Public subnets for ALB, private subnets for EKS across 3 AZs
3. **EKS Cluster** - Managed Kubernetes cluster with ~1000 nodes
4. **AWS Load Balancer Controller** - Automatically creates ALBs for Ingress resources
5. **VPC Lattice** - Service mesh enabling overlapping IPs and service discovery
6. **Security Groups** - Least-privilege security between components

### Traffic Flow

```
VPC Lattice Service Network
         ↓
   ALB (cell edge) ← Created by AWS Load Balancer Controller
         ↓
Target Group (EKS pods in IP mode)
         ↓
   Application Pods
```

## Prerequisites

1. **AWS Organization** - Required for VPC Lattice IAM policies
2. **VPC Lattice Service Network** - Must be created once per region (shared across all cells)
3. **AWS CLI** configured with appropriate credentials
4. **Terraform** >= 1.5
5. **kubectl** for Kubernetes access

## Usage

### 1. Create VPC Lattice Service Network (once per region)

```bash
# Create in us-east-1
aws vpc-lattice create-service-network \
  --name twilio-us-east-1-service-network \
  --auth-type AWS_IAM \
  --region us-east-1

# Save the service network ID
export VPC_LATTICE_SERVICE_NETWORK_ID="sn-0123456789abcdef0"
```

### 2. Initialize Terraform

```bash
cd terraform/enterprise-cell
terraform init
```

### 3. Create terraform.tfvars

```hcl
cell_id                        = "enterprise-us-east-1-a"
region                         = "us-east-1"
vpc_lattice_service_network_id = "sn-0123456789abcdef0"
aws_organization_id            = "o-abcdefghij"

availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]

# Customer assignments (max 100 for enterprise cells)
customer_ids = [
  "customer-001",
  "customer-002",
  "customer-003",
  # ... up to 100 customers
]

eks_cluster_version      = "1.28"
eks_node_instance_types  = ["m6i.4xlarge"]
eks_node_desired_size    = 100
eks_node_min_size        = 50
eks_node_max_size        = 200

tags = {
  Environment = "production"
  ManagedBy   = "terraform"
  Tier        = "enterprise"
}
```

### 4. Deploy the Cell

```bash
terraform plan
terraform apply
```

### 5. Configure kubectl

```bash
aws eks update-kubeconfig \
  --name enterprise-us-east-1-a-eks \
  --region us-east-1
```

### 6. Deploy Application with ALB Ingress

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: twilio-api
  namespace: default
  annotations:
    # AWS Load Balancer Controller annotations
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internal
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abcd-1234

    # VPC Lattice integration (register ALB with VPC Lattice)
    alb.ingress.kubernetes.io/tags: "VPCLattice=true,CellID=enterprise-us-east-1-a"

spec:
  rules:
  - host: api.twilio-internal.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: twilio-api-service
            port:
              number: 8080
```

When this Ingress is created, the AWS Load Balancer Controller will:
1. Create an ALB in the public subnets
2. Create a target group pointing to pod IPs
3. Register the ALB with the VPC Lattice target group

### 7. Register ALB with VPC Lattice (Manual Step)

After the ALB is created, register it with the VPC Lattice target group:

```bash
# Get ALB ARN
ALB_ARN=$(kubectl get ingress twilio-api -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' | \
  xargs aws elbv2 describe-load-balancers --query 'LoadBalancers[0].LoadBalancerArn' --output text)

# Get VPC Lattice target group ID
VPC_LATTICE_TG_ID=$(terraform output -raw vpc_lattice_target_group_id)

# Register ALB as target
aws vpc-lattice register-targets \
  --target-group-identifier $VPC_LATTICE_TG_ID \
  --targets id=$ALB_ARN
```

## Creating Multiple Cells

To create multiple enterprise cells (each with overlapping IPs):

```bash
# Cell A
terraform workspace new cell-a
terraform apply -var="cell_id=enterprise-us-east-1-a"

# Cell B (overlapping IPs!)
terraform workspace new cell-b
terraform apply -var="cell_id=enterprise-us-east-1-b"

# Cell C
terraform workspace new cell-c
terraform apply -var="cell_id=enterprise-us-east-1-c"
```

All three cells will use `10.0.0.0/16` - VPC Lattice handles the routing!

## Overlapping IP Spaces - How It Works

**Problem**: 100+ enterprise cells × `/16` CIDR = impossible with RFC1918

**Solution**: VPC Lattice routes by **service name**, not IP address

**Example**:
- `enterprise-us-east-1-a-vpc`: 10.0.0.0/16 → Service: `enterprise-us-east-1-a-api`
- `enterprise-us-east-1-b-vpc`: 10.0.0.0/16 → Service: `enterprise-us-east-1-b-api`
- `enterprise-us-east-1-c-vpc`: 10.0.0.0/16 → Service: `enterprise-us-east-1-c-api`

VPC Lattice resolves the service name to the correct VPC's ALB, regardless of overlapping IPs.

## Outputs

| Output | Description |
|--------|-------------|
| `vpc_id` | ID of the cell VPC |
| `vpc_cidr` | CIDR block (10.0.0.0/16) |
| `eks_cluster_id` | EKS cluster name |
| `eks_cluster_endpoint` | EKS API endpoint |
| `vpc_lattice_service_id` | VPC Lattice service ID |
| `vpc_lattice_service_dns` | DNS name for the cell service |

## Cost Estimate (per cell)

| Component | Estimated Monthly Cost |
|-----------|----------------------|
| EKS Control Plane | $73 |
| EC2 Instances (100× m6i.4xlarge on-demand) | ~$50,000 |
| NAT Gateways (3× AZs) | ~$100 |
| ALB | ~$20 |
| VPC Lattice | ~$10 |
| **Total** | **~$50,200/month** |

**Cost Optimization**:
- Use Reserved Instances (save 30-50%)
- Use Graviton instances (save 20%)
- Right-size based on actual usage

## Security Considerations

1. **VPC Isolation** - Each cell is a dedicated VPC
2. **Security Groups** - Least-privilege rules between ALB → Pods
3. **IAM Roles for Service Accounts (IRSA)** - Pod-level IAM permissions
4. **VPC Lattice IAM Auth** - Service-to-service authentication
5. **Encryption** - At-rest (EBS encryption) and in-transit (TLS)

## Disaster Recovery

- **Multi-AZ** - Nodes across 3 availability zones
- **Multi-Region** - Deploy cells in multiple regions
- **RTO** - < 5 minutes (VPC Lattice health checks + auto-scaling)
- **RPO** - < 1 second (DynamoDB Global Tables replication)

## Troubleshooting

### ALB Not Created

Check AWS Load Balancer Controller logs:
```bash
kubectl logs -n kube-system deployment/aws-load-balancer-controller
```

### VPC Lattice Routing Issues

Check service associations:
```bash
aws vpc-lattice list-service-network-service-associations \
  --service-network-identifier $VPC_LATTICE_SERVICE_NETWORK_ID
```

### Pod Network Connectivity

Verify security group rules:
```bash
aws ec2 describe-security-group-rules \
  --filters Name=group-id,Values=$EKS_POD_SG_ID
```

## References

- [VPC Lattice Documentation](https://docs.aws.amazon.com/vpc-lattice/)
- [AWS Load Balancer Controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/)
- [EKS Best Practices](https://aws.github.io/aws-eks-best-practices/)
📄 terraform/enterprise-cell/terraform.tfvars.example 74 lines
# Example Terraform Variables for Enterprise Cell
# Copy this file to terraform.tfvars and customize for your deployment

# Cell identifier - must be unique across all cells
cell_id = "enterprise-us-east-1-a"

# AWS region for this cell
region = "us-east-1"

# VPC Lattice service network ID (created once per region, shared across cells)
# Create with: aws vpc-lattice create-service-network --name twilio-us-east-1-service-network --auth-type AWS_IAM
vpc_lattice_service_network_id = "sn-0123456789abcdef0"

# AWS Organization ID (for VPC Lattice IAM policies)
# Get with: aws organizations describe-organization --query 'Organization.Id' --output text
aws_organization_id = "o-abcdefghij"

# Availability zones for multi-AZ deployment
availability_zones = [
  "us-east-1a",
  "us-east-1b",
  "us-east-1c"
]

# OVERLAPPING IP SPACE - All enterprise cells use 10.0.0.0/16
# VPC Lattice enables this via service-based routing (not IP-based)
vpc_cidr = "10.0.0.0/16"

# EKS cluster configuration
eks_cluster_version = "1.28"

# Instance types for EKS nodes
# Graviton (m7g.4xlarge) recommended for 20% better price/performance
eks_node_instance_types = ["m6i.4xlarge"]  # Or ["m7g.4xlarge"] for Graviton

# Node group scaling
eks_node_desired_size = 100  # Desired number of nodes
eks_node_min_size     = 50   # Minimum for auto-scaling
eks_node_max_size     = 200  # Maximum for auto-scaling

# Customer IDs assigned to this cell (max 100 for enterprise cells)
# These customers' traffic will be routed to this cell by the Cell Router
customer_ids = [
  "acme-corp",
  "globex-inc",
  "initech",
  # Add up to 100 customer IDs...
]

# Common tags applied to all resources
tags = {
  Environment = "production"
  ManagedBy   = "terraform"
  Team        = "platform"
  Tier        = "enterprise"
  CostCenter  = "engineering"
}

# ============================================================================
# EXAMPLE: Multiple Cells with Overlapping IPs
# ============================================================================

# Cell A: 10.0.0.0/16
# cell_id = "enterprise-us-east-1-a"
# vpc_cidr = "10.0.0.0/16"

# Cell B: 10.0.0.0/16 (SAME CIDR - VPC Lattice handles routing!)
# cell_id = "enterprise-us-east-1-b"
# vpc_cidr = "10.0.0.0/16"

# Cell C: 10.0.0.0/16 (SAME CIDR - VPC Lattice handles routing!)
# cell_id = "enterprise-us-east-1-c"
# vpc_cidr = "10.0.0.0/16"

📖 How to Use This Code

Prerequisites:

  1. Create VPC Lattice service network (once per region): aws vpc-lattice create-service-network --name twilio-us-east-1-service-network --auth-type AWS_IAM
  2. Get AWS Organization ID: aws organizations describe-organization --query 'Organization.Id' --output text

Deploy:

  1. Copy terraform.tfvars.example to terraform.tfvars
  2. Update variables (cell_id, region, vpc_lattice_service_network_id, customer_ids)
  3. Run terraform init && terraform plan && terraform apply
  4. Configure kubectl: aws eks update-kubeconfig --name <cell_id>-eks --region <region>

Multiple Cells: Deploy additional cells with the same vpc_cidr = "10.0.0.0/16" but different cell_id values. VPC Lattice handles overlapping IPs.