☸️ Kubernetes Architecture - Complete Guide

Kubernetes (K8s) is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications.

Key Benefits:

Kubernetes Cluster Architecture

graph LR KUBECTL["kubectl"] -.->|"Commands"| API subgraph CP["CONTROL PLANE (Master)"] API["API Server
(kube-apiserver)"] SCHED["Scheduler
(kube-scheduler)"] CTRL["Controller Manager
(kube-controller-manager)"] ETCD["etcd
(Distributed key-value store)"] API --> ETCD SCHED --> API CTRL --> API end subgraph WN["WORKER NODES"] subgraph N1["Node 1"] KUB1["kubelet"] PROXY1["kube-proxy"] RT1["Container Runtime
(containerd)"] subgraph PODS1["Pods"] POD1["Pod1"] POD2["Pod2"] POD3["Pod3"] POD4["Pod4"] end KUB1 --> PODS1 PROXY1 --> PODS1 RT1 --> PODS1 end N2["Node 2, Node 3, ...
(same structure)"] end API -.->|"Manages"| KUB1 API -.->|"Manages"| PROXY1 style CP fill:#e3f2fd,stroke:#1976d2,stroke-width:3px,color:#2e3440 style WN fill:#f3e5f5,stroke:#7b1fa2,stroke-width:3px,color:#2e3440 style N1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440 style PODS1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style API fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440 style SCHED fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440 style CTRL fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440 style ETCD fill:#ffccbc,stroke:#e64a19,stroke-width:2px,color:#2e3440 style KUBECTL fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440

Kubernetes Resource Relationships

1. Nodes and Pods Relationship

Concept: Nodes are physical/virtual machines. Pods run on Nodes.

graph TB subgraph CLUSTER["Kubernetes Cluster"] subgraph NODE1["Node 1 (Worker Machine)
IP: 10.0.1.5"] POD1A["Pod: web-app-1
IP: 192.168.1.10
Containers: nginx"] POD1B["Pod: web-app-2
IP: 192.168.1.11
Containers: nginx"] POD1C["Pod: cache-1
IP: 192.168.1.12
Containers: redis"] end subgraph NODE2["Node 2 (Worker Machine)
IP: 10.0.1.6"] POD2A["Pod: web-app-3
IP: 192.168.2.10
Containers: nginx"] POD2B["Pod: db-1
IP: 192.168.2.11
Containers: postgres"] end subgraph NODE3["Node 3 (Worker Machine)
IP: 10.0.1.7"] POD3A["Pod: web-app-4
IP: 192.168.3.10
Containers: nginx"] POD3B["Pod: worker-1
IP: 192.168.3.11
Containers: python"] end end style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440 style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440 style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440 style POD1A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD1B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD1C fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD2A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD2B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD3A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD3B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440

Key Points:

2. ReplicaSet: Managing Pod Replicas

Concept: ReplicaSet ensures N identical Pods are always running.

graph TB RS["ReplicaSet: web-app
Desired: 3 replicas
Selector: app=web"] RS -->|"Creates & Manages"| POD1["Pod: web-app-abc123
Labels: app=web
Status: Running"] RS -->|"Creates & Manages"| POD2["Pod: web-app-def456
Labels: app=web
Status: Running"] RS -->|"Creates & Manages"| POD3["Pod: web-app-ghi789
Labels: app=web
Status: Running"] DEAD["Pod: web-app-xyz
Status: Failed ❌"] RS -.->|"Detects failure
Creates replacement"| POD3 DEAD -.->|"Was managed by"| RS NODE1["Node 1"] -.->|"Runs"| POD1 NODE2["Node 2"] -.->|"Runs"| POD2 NODE3["Node 3"] -.->|"Runs"| POD3 style RS fill:#bbdefb,stroke:#1976d2,stroke-width:3px,color:#2e3440 style POD1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style DEAD fill:#ffcdd2,stroke:#c62828,stroke-width:2px,color:#2e3440 style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440 style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440 style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440

How it works:

3. DaemonSet: One Pod Per Node

Concept: DaemonSet ensures exactly ONE Pod runs on every Node (or matching Nodes).

graph TB DS["DaemonSet: log-collector
Runs on: ALL nodes"] subgraph CLUSTER["Cluster"] subgraph NODE1["Node 1"] POD1["Pod: log-collector-node1
Collects logs from Node 1"] APP1A["App Pod 1"] APP1B["App Pod 2"] end subgraph NODE2["Node 2"] POD2["Pod: log-collector-node2
Collects logs from Node 2"] APP2A["App Pod 3"] end subgraph NODE3["Node 3"] POD3["Pod: log-collector-node3
Collects logs from Node 3"] APP3A["App Pod 4"] APP3B["App Pod 5"] end end DS -->|"Ensures 1 Pod on"| NODE1 DS -->|"Ensures 1 Pod on"| NODE2 DS -->|"Ensures 1 Pod on"| NODE3 POD1 -.->|"Monitors"| APP1A POD1 -.->|"Monitors"| APP1B POD2 -.->|"Monitors"| APP2A POD3 -.->|"Monitors"| APP3A POD3 -.->|"Monitors"| APP3B style DS fill:#ce93d8,stroke:#8e24aa,stroke-width:3px,color:#2e3440 style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440 style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440 style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:3px,color:#2e3440 style POD1 fill:#e1bee7,stroke:#8e24aa,stroke-width:2px,color:#2e3440 style POD2 fill:#e1bee7,stroke:#8e24aa,stroke-width:2px,color:#2e3440 style POD3 fill:#e1bee7,stroke:#8e24aa,stroke-width:2px,color:#2e3440 style APP1A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style APP1B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style APP2A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style APP3A fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style APP3B fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440

Common DaemonSet Use Cases:

4. Sidecar Pattern: Multiple Containers in One Pod

Concept: Pods can have multiple containers that share resources.

graph TB subgraph POD["Pod: web-app-with-sidecar
IP: 192.168.1.10"] subgraph SHARED["Shared Resources"] NETWORK["Shared Network
(localhost)"] VOLUME["Shared Volume
(/var/log)"] end MAIN["Main Container
nginx:1.21
Port: 80
Writes logs to /var/log/nginx/"] SIDECAR["Sidecar Container
fluentd
Reads logs from /var/log/nginx/
Sends to Elasticsearch"] MAIN -->|"Shares"| NETWORK MAIN -->|"Writes to"| VOLUME SIDECAR -->|"Shares"| NETWORK SIDECAR -->|"Reads from"| VOLUME end EXTERNAL["External Log Storage
(Elasticsearch)"] SIDECAR -.->|"Forwards logs"| EXTERNAL style POD fill:#e8f5e9,stroke:#388e3c,stroke-width:3px,color:#2e3440 style MAIN fill:#bbdefb,stroke:#1976d2,stroke-width:2px,color:#2e3440 style SIDECAR fill:#fff9c4,stroke:#f57f17,stroke-width:2px,color:#2e3440 style NETWORK fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px,color:#2e3440 style VOLUME fill:#ffe0b2,stroke:#ff6f00,stroke-width:2px,color:#2e3440 style SHARED fill:#fafafa,stroke:#757575,stroke-width:2px,color:#2e3440 style EXTERNAL fill:#c5e1a5,stroke:#558b2f,stroke-width:2px,color:#2e3440

Sidecar Benefits:

Common Sidecar Patterns:

5. Complete Hierarchy: Deployment → ReplicaSet → Pods

Concept: Deployments manage ReplicaSets, which manage Pods.

graph TB DEP["Deployment: web-app
Replicas: 3
Image: nginx:1.21"] RS_NEW["ReplicaSet: web-app-v2
Replicas: 3
Current"] RS_OLD["ReplicaSet: web-app-v1
Replicas: 0
Kept for rollback"] DEP -->|"Creates/Manages"| RS_NEW DEP -.->|"Keeps for rollback"| RS_OLD RS_NEW -->|"Manages"| POD1["Pod: web-app-v2-abc
nginx:1.21"] RS_NEW -->|"Manages"| POD2["Pod: web-app-v2-def
nginx:1.21"] RS_NEW -->|"Manages"| POD3["Pod: web-app-v2-ghi
nginx:1.21"] RS_OLD -.->|"Previously managed"| POD_OLD["Pod: web-app-v1-xyz
nginx:1.20
(terminated)"] subgraph NODES["Distributed Across Nodes"] NODE1["Node 1"] -.-> POD1 NODE2["Node 2"] -.-> POD2 NODE3["Node 3"] -.-> POD3 end style DEP fill:#90caf9,stroke:#0277bd,stroke-width:3px,color:#2e3440 style RS_NEW fill:#bbdefb,stroke:#1976d2,stroke-width:3px,color:#2e3440 style RS_OLD fill:#e0e0e0,stroke:#616161,stroke-width:2px,stroke-dasharray: 5 5,color:#2e3440 style POD1 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD3 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#2e3440 style POD_OLD fill:#ffcdd2,stroke:#c62828,stroke-width:2px,stroke-dasharray: 5 5,color:#2e3440 style NODE1 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440 style NODE2 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440 style NODE3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#2e3440 style NODES fill:#fafafa,stroke:#757575,stroke-width:2px,color:#2e3440

Why this hierarchy?

Update Process:

  1. You update Deployment (change image nginx:1.20 → nginx:1.21)
  2. Deployment creates NEW ReplicaSet (web-app-v2)
  3. New ReplicaSet scales UP (creates 3 new Pods)
  4. Old ReplicaSet scales DOWN (terminates old Pods)
  5. Old ReplicaSet kept with 0 replicas (for rollback)

Control Plane Components

1. API Server (kube-apiserver)
Purpose: The front-end of the Kubernetes control plane. All components communicate through the API server.
How it works:
Request Flow:
  1. kubectl sends request to API server
  2. API server authenticates and authorizes
  3. API server validates the request
  4. API server writes to etcd
  5. API server returns response
Interview Tip: The API server is stateless and horizontally scalable. It's the only component that directly accesses etcd. All cluster state changes go through the API server.
2. etcd
Purpose: Distributed, consistent key-value store that holds the entire cluster state.
How it works:
Stored data includes: - Cluster configuration - Resource definitions (Pods, Services, etc.) - Secrets and ConfigMaps - Node status - Current state vs desired state
Interview Tip: etcd is critical - if etcd fails, the cluster can't function. Regular backups are essential. Uses Raft consensus (see raft_consensus.py).
3. Scheduler (kube-scheduler)
Purpose: Assigns Pods to Nodes based on resource requirements and constraints.
How it works:
Scheduling Process:
  1. Filtering: Remove nodes that don't meet requirements
    • Insufficient CPU/memory
    • Node selectors don't match
    • Taints/tolerations conflicts
    • Volume constraints
  2. Scoring: Rank remaining nodes
    • Resource availability
    • Pod spreading (balance across nodes)
    • Affinity/anti-affinity rules
  3. Binding: Assign Pod to highest-scoring node
Interview Tip: Scheduler only assigns Pods to Nodes. kubelet actually runs the Pod. You can write custom schedulers if needed.
4. Controller Manager (kube-controller-manager)
Purpose: Runs controller processes that regulate the cluster state.
How it works:
Key Controllers: - Node Controller: Monitors node health, marks unavailable - Replication Controller: Maintains correct number of Pods - Endpoints Controller: Populates Endpoints (Services + Pods) - Service Account Controller: Creates default service accounts - Namespace Controller: Manages namespace lifecycle - Deployment Controller: Manages ReplicaSets for Deployments - StatefulSet Controller: Manages StatefulSets - Job Controller: Manages Jobs and CronJobs
Control Loop (Reconciliation):
  1. Read desired state from API server
  2. Read current state from API server
  3. Compare desired vs current
  4. Take action to reconcile (create, update, delete resources)
  5. Update status in API server
  6. Repeat
Interview Tip: Controllers implement the "reconciliation loop" - continuously working to make actual state match desired state. This is Kubernetes' core operating principle.

Node (Worker) Components

5. kubelet
Purpose: Agent that runs on each worker node, ensuring containers are running in Pods.
How it works:
kubelet Workflow:
  1. API server assigns Pod to node
  2. kubelet receives Pod spec
  3. kubelet tells container runtime to pull images
  4. kubelet creates volumes if needed
  5. kubelet tells container runtime to start containers
  6. kubelet monitors container health
  7. kubelet reports status back to API server
Interview Tip: kubelet is the "node agent". It doesn't manage containers that weren't created by Kubernetes. It communicates with the container runtime via CRI (Container Runtime Interface).
6. kube-proxy
Purpose: Network proxy that maintains network rules for Pod communication.
How it works:
Modes: - iptables mode (default): Uses iptables rules for load balancing - IPVS mode: Uses IPVS (Linux Virtual Server) for better performance - userspace mode (legacy): Proxies connections in userspace
Service Access Flow:
  1. Client Pod sends request to Service IP (ClusterIP)
  2. kube-proxy intercepts via iptables/IPVS rules
  3. kube-proxy load balances to backend Pod
  4. Traffic forwarded to selected Pod
Interview Tip: kube-proxy doesn't actually proxy traffic in most modes. It programs iptables/IPVS rules, and the kernel handles the actual routing.
7. Container Runtime
Purpose: Software responsible for running containers.
How it works:
Supported Runtimes: - containerd (most common): Industry standard, Docker's runtime - CRI-O: Lightweight, OCI-compliant - Docker (deprecated): Dockershim removed in K8s 1.24+
Interview Tip: Docker was deprecated as a runtime because K8s talks to containerd directly now (which Docker uses internally anyway). Your Docker images still work!
8. CNI (Container Network Interface)
Purpose: Plugin interface for configuring network interfaces in containers.
How it works:
Popular CNI Plugins: - Calico: L3 networking, network policies, BGP routing - Flannel: Simple overlay network (VXLAN) - Weave Net: Simple setup, encrypts traffic - Cilium: eBPF-based, advanced observability - AWS VPC CNI: Native AWS networking
Pod Networking:
  1. kubelet calls CNI plugin when Pod starts
  2. CNI assigns IP from Pod CIDR range
  3. CNI sets up virtual network interface
  4. CNI configures routes for Pod communication
  5. Pod can now communicate with other Pods
Interview Tip: Kubernetes network model requires: 1) All Pods can communicate without NAT, 2) All nodes can communicate with all Pods, 3) Each Pod has its own IP.

kubectl - The Kubernetes CLI

kubectl
Purpose: Command-line tool for interacting with Kubernetes clusters.
Common Commands:
# Get resources
kubectl get pods
kubectl get nodes
kubectl get services
kubectl get deployments

# Describe (detailed info)
kubectl describe pod my-pod
kubectl describe node node-1

# Create resources
kubectl create -f deployment.yaml
kubectl apply -f service.yaml

# Update resources
kubectl edit deployment my-app
kubectl scale deployment my-app --replicas=5

# Delete resources
kubectl delete pod my-pod
kubectl delete -f deployment.yaml

# Logs and debugging
kubectl logs my-pod
kubectl logs -f my-pod  # follow
kubectl exec -it my-pod -- /bin/bash

# Port forwarding
kubectl port-forward pod/my-pod 8080:80

# Labels and selectors
kubectl get pods -l app=nginx
kubectl label pods my-pod env=prod
        
Interview Tip: kubectl talks to the API server. It reads config from ~/.kube/config which contains cluster info, credentials, and context.

Kubernetes Resource Types

Pod

Smallest deployable unit. One or more containers that share network and storage.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
spec:
  containers:
  - name: nginx
    image: nginx:1.21
    ports:
    - containerPort: 80
            

Use case: Basic unit, but usually managed by higher-level resources.

ReplicaSet

Maintains a stable set of replica Pods. Ensures specified number of Pods are running.

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: nginx-rs
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    # Pod template here
            

Use case: Rarely used directly; Deployments manage ReplicaSets.

Deployment

Manages ReplicaSets and provides declarative updates. Most common way to run stateless apps.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
            

Features: Rolling updates, rollback, scaling, self-healing.

StatefulSet

For stateful applications. Provides stable network identity and persistent storage.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: "mysql"
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    # Pod template
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
            

Use case: Databases, distributed systems (Kafka, Cassandra).

Features: Ordered deployment/scaling, stable network IDs (pod-0, pod-1), persistent volumes.

DaemonSet

Runs a copy of a Pod on every node.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    # Pod template
            

Use case: Logging agents (Fluentd), monitoring (Prometheus node exporter), CNI plugins.

Job

Runs a task to completion. For batch processing.

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-calculation
spec:
  completions: 5
  parallelism: 2
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl", "-Mbignum=bpi",
                  "-wle", "print bpi(2000)"]
      restartPolicy: Never
            

Use case: Data processing, migrations, batch jobs.

CronJob

Runs Jobs on a schedule.

apiVersion: batch/v1
kind: CronJob
metadata:
  name: backup-job
spec:
  schedule: "0 2 * * *"  # 2 AM daily
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: backup:latest
          restartPolicy: OnFailure
            

Use case: Backups, report generation, cleanup tasks.

Service Types

Service Type Description Use Case Access Method
ClusterIP (default) Exposes Service on cluster-internal IP. Only reachable from within cluster. Internal microservices communication ClusterIP:Port (e.g., 10.96.0.1:80)
NodePort Exposes Service on each Node's IP at a static port (30000-32767). Development, testing, quick external access NodeIP:NodePort (e.g., 192.168.1.10:30080)
LoadBalancer Creates external load balancer (cloud provider). Assigns external IP. Production external access on cloud platforms External IP provided by cloud (e.g., AWS ELB)
ExternalName Maps Service to external DNS name (CNAME). Access external services (RDS, external APIs) DNS name (e.g., database.example.com)

Service Examples

ClusterIP Service

apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: ClusterIP  # default
  selector:
    app: nginx
  ports:
  - protocol: TCP
    port: 80        # Service port
    targetPort: 8080 # Container port
        

LoadBalancer Service

apiVersion: v1
kind: Service
metadata:
  name: my-lb-service
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  # Cloud provider provisions external LB
        

Headless Service (for StatefulSet)

apiVersion: v1
kind: Service
metadata:
  name: mysql
spec:
  clusterIP: None  # Headless!
  selector:
    app: mysql
  ports:
  - port: 3306
# Provides DNS for each Pod: mysql-0.mysql, mysql-1.mysql, etc.
        

Additional Important Resources

ConfigMap

Store non-sensitive configuration data.

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database_url: "postgres://db:5432"
  log_level: "info"
            

Usage: Environment variables or mounted as files.

Secret

Store sensitive data (passwords, tokens).

apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  password: cGFzc3dvcmQ=  # base64
            

Note: Base64 encoded, not encrypted. Use external secret managers for production.

PersistentVolume (PV)

Cluster resource representing storage.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-1
spec:
  capacity:
    storage: 10Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: slow
  hostPath:
    path: /mnt/data
            

PersistentVolumeClaim (PVC)

Request for storage by a user.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-1
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: slow
            

Workflow: User creates PVC → K8s binds to matching PV → Pod uses PVC.

Ingress

HTTP/HTTPS routing to Services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-ingress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
            

Requires: Ingress Controller (Nginx, Traefik, HAProxy).

Namespace

Virtual cluster for resource isolation.

apiVersion: v1
kind: Namespace
metadata:
  name: production
            

Use case: Separate dev/staging/prod, multi-tenancy, resource quotas.

Default namespaces: default, kube-system, kube-public, kube-node-lease.

Node Labels and Selectors

Node Labels

Key-value pairs attached to nodes for organization and scheduling.

# Label a node
kubectl label nodes node-1 disktype=ssd
kubectl label nodes node-2 environment=production

# View labels
kubectl get nodes --show-labels
        

Node Selector (Simple)

Schedule Pods only on nodes with specific labels.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
  nodeSelector:
    disktype: ssd  # Only schedule on nodes with this label
        

Node Affinity (Advanced)

More expressive than nodeSelector, with soft/hard requirements.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
            - nvme
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: environment
            operator: In
            values:
            - production
  containers:
  - name: nginx
    image: nginx
        

Taints and Tolerations

Prevent Pods from scheduling on nodes unless they tolerate the taint.

# Taint a node (repel Pods)
kubectl taint nodes node-1 key=value:NoSchedule

# Pod with toleration (allows scheduling on tainted node)
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"
  containers:
  - name: nginx
    image: nginx
        
Use cases: Dedicated nodes (GPU, high-memory), node maintenance, workload isolation.

Key Interview Concepts

How does Kubernetes achieve high availability?

How does a Pod get created? (End-to-end flow)

  1. User runs kubectl create -f pod.yaml
  2. kubectl sends request to API server
  3. API server validates, authenticates, authorizes
  4. API server writes Pod spec to etcd
  5. Scheduler watches for unassigned Pods
  6. Scheduler selects a node and binds Pod to it (updates etcd)
  7. kubelet on that node watches for new Pod assignments
  8. kubelet tells container runtime to pull image and start containers
  9. Container runtime starts containers
  10. kubelet reports Pod status to API server
  11. kube-proxy updates network rules for Service discovery

How does Service discovery work?

Deployment vs StatefulSet vs DaemonSet

Aspect Deployment StatefulSet DaemonSet
Use case Stateless apps (web servers, APIs) Stateful apps (databases, Kafka) Node-level services (logging, monitoring)
Pod identity Interchangeable, random names Stable, ordered (pod-0, pod-1) One per node
Scaling Unordered, parallel Ordered (pod-0 before pod-1) Auto-scales with cluster
Storage Ephemeral or shared volumes Persistent, per-Pod storage Usually host volumes

Rolling Update Process

  1. User updates Deployment (new image version)
  2. Deployment controller creates new ReplicaSet
  3. New ReplicaSet scales up (creates new Pods)
  4. Old ReplicaSet scales down (terminates old Pods)
  5. Process continues until all Pods are new version
  6. Old ReplicaSet kept for rollback (history)

Parameters: maxSurge (extra Pods during update), maxUnavailable (Pods down during update)

Summary

Key Takeaways for Interviews