🐳 Docker & Container Fundamentals

Complete guide to containerization, Docker architecture, and best practices

📦 What Are Containers?

Containers are lightweight, standalone, executable packages that include everything needed to run a piece of software: code, runtime, system tools, libraries, and settings.

Key Insight: Containers share the host OS kernel but provide isolated user spaces. This makes them much lighter than VMs which each run a full OS.

Containers vs Virtual Machines

graph TB subgraph "Virtual Machines" VM1[App A] --> GuestOS1[Guest OS] VM2[App B] --> GuestOS2[Guest OS] GuestOS1 --> Hypervisor GuestOS2 --> Hypervisor Hypervisor --> HostOS1[Host OS] HostOS1 --> HW1[Hardware] end subgraph "Containers" C1[App A] --> ContainerEngine[Container Runtime] C2[App B] --> ContainerEngine ContainerEngine --> HostOS2[Host OS] HostOS2 --> HW2[Hardware] end
Feature Containers Virtual Machines
Startup Time Seconds (or milliseconds) Minutes
Size MBs (lightweight) GBs (includes full OS)
Isolation Process-level (shared kernel) Full OS isolation
Resource Usage Minimal overhead Significant overhead
Portability Very portable (same OS kernel) Less portable (different hypervisors)
Use Case Microservices, dev environments Legacy apps, different OS requirements

🔧 Linux Container Fundamentals

Containers rely on three core Linux kernel features:

1. Namespaces (Isolation)

Namespaces provide isolation for different system resources:

2. Control Groups (cgroups) - Resource Limiting

cgroups limit and monitor resource usage:

3. Union Filesystems (Layer Management)

Union filesystems (OverlayFS, AUFS) allow containers to share base image layers:

graph BT Container[Container Layer - Writable] Layer3[nginx config - Layer 3] Layer2[nginx install - Layer 2] Layer1[apt update - Layer 1] Base[Base Image: ubuntu:22.04] Container --> Layer3 Layer3 --> Layer2 Layer2 --> Layer1 Layer1 --> Base style Container fill:#90EE90,color:#2e3440 style Base fill:#ADD8E6,color:#2e3440

🏗️ Docker Architecture

graph LR Client[Docker Client
docker build
docker run
docker push] Client -->|REST API| Daemon[Docker Daemon
dockerd] Daemon --> Images[Images] Daemon --> Containers[Containers] Daemon --> Networks[Networks] Daemon --> Volumes[Volumes] Daemon -->|pull/push| Registry[Docker Registry
Docker Hub] style Client fill:#4A90E2,color:#2e3440 style Daemon fill:#E74C3C,color:#2e3440 style Registry fill:#F39C12,color:#2e3440

Key Components

  1. Docker Client: CLI tool that sends commands to the daemon
  2. Docker Daemon (dockerd): Background service that manages images, containers, networks, and volumes
  3. Docker Images: Read-only templates with instructions for creating containers
  4. Docker Containers: Runnable instances of images
  5. Docker Registry: Stores Docker images (Docker Hub, AWS ECR, Google GCR)

📝 Dockerfile Best Practices

Basic Dockerfile Example

# Use official base image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements first (layer caching!)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

# Run application
CMD ["python", "app.py"]

Multi-Stage Builds (Critical for Production)

Multi-stage builds dramatically reduce image size by separating build and runtime environments:

# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Production runtime
FROM node:18-alpine
WORKDIR /app

# Copy only necessary files from builder
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Run as non-root
USER node

EXPOSE 3000
CMD ["node", "dist/server.js"]
Result: Multi-stage build reduced image size from 1.2GB to 150MB! Build tools (compilers, dev dependencies) are not included in final image.

Dockerfile Optimization Techniques

1. Layer Caching Strategy

# ❌ BAD: Changes to code invalidate dependency layer
FROM python:3.11-slim
COPY . .
RUN pip install -r requirements.txt

# ✅ GOOD: Dependencies cached unless requirements.txt changes
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

2. Minimize Layer Count

# ❌ BAD: Each RUN creates a new layer
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN rm -rf /var/lib/apt/lists/*

# ✅ GOOD: Combine into single layer
RUN apt-get update && \
    apt-get install -y curl git && \
    rm -rf /var/lib/apt/lists/*

3. Use .dockerignore

# .dockerignore
node_modules
.git
.env
*.log
.vscode
__pycache__
*.pyc
.pytest_cache
dist
build

4. Choose Minimal Base Images

Base Image Size Use Case
ubuntu:22.04 77MB Full-featured, debugging tools included
python:3.11-slim 125MB Minimal Python with basic tools
python:3.11-alpine 50MB Ultra-minimal (musl libc - compatibility issues possible)
distroless/python3 53MB No shell, no package manager - maximum security

🌐 Docker Networking

Network Drivers

Driver Description Use Case
bridge Default network. Containers on same bridge can communicate Single-host container communication
host Remove network isolation - container uses host's network High performance, no port mapping overhead
overlay Multi-host networking for Docker Swarm Distributed applications across multiple hosts
none No networking - complete isolation Maximum security, offline processing
macvlan Assign MAC address - container appears as physical device Legacy applications expecting direct network access

Networking Commands

$ docker network create my-app-network $ docker network ls $ docker network inspect my-app-network # Run container on custom network $ docker run -d --name web --network my-app-network nginx # Connect running container to network $ docker network connect my-app-network redis
graph TB subgraph "Docker Host" subgraph "Bridge Network: my-app-network" Web[web container
nginx] API[api container
flask] DB[db container
postgres] end Web -->|internal DNS
http://api:5000| API API -->|postgres://db:5432| DB Bridge[Docker Bridge
172.18.0.1] Web -.-> Bridge API -.-> Bridge DB -.-> Bridge end Bridge -->|Port mapping
80:80| External[External Traffic] style External fill:#E74C3C,color:#2e3440

💾 Docker Storage & Volumes

Three Types of Storage

Type Description Persistence Use Case
Volumes Managed by Docker, stored in /var/lib/docker/volumes/ Persists beyond container lifecycle Database data, application state
Bind Mounts Mount any host path into container Persists on host filesystem Development (mount source code), configs
tmpfs Stored in host memory only Lost when container stops Sensitive data, temporary processing

Volume Examples

# Create named volume $ docker volume create postgres-data # Run container with volume $ docker run -d \ --name postgres \ -v postgres-data:/var/lib/postgresql/data \ postgres:15 # Bind mount (development) $ docker run -d \ --name dev-app \ -v $(pwd):/app \ -v /app/node_modules \ node:18 # tmpfs mount (in-memory) $ docker run -d \ --name cache \ --tmpfs /tmp:rw,size=512m,mode=1777 \ redis:7

🎼 Docker Compose

Docker Compose manages multi-container applications with a single YAML file.

Complete Example: Web Application Stack

# docker-compose.yml
version: '3.8'

services:
  # Nginx reverse proxy
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - api
    networks:
      - frontend
    restart: unless-stopped

  # Python API service
  api:
    build:
      context: ./api
      dockerfile: Dockerfile
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/myapp
      REDIS_URL: redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    networks:
      - frontend
      - backend
    restart: unless-stopped

  # PostgreSQL database
  db:
    image: postgres:15
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: myapp
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - backend
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  # Redis cache
  cache:
    image: redis:7-alpine
    networks:
      - backend
    restart: unless-stopped

networks:
  frontend:
  backend:

volumes:
  postgres-data:

Docker Compose Commands

# Start all services $ docker-compose up -d # View logs $ docker-compose logs -f api # Scale services $ docker-compose up -d --scale api=3 # Stop and remove everything $ docker-compose down # Stop and remove including volumes $ docker-compose down -v

🔒 Security Best Practices

1. Run as Non-Root User

FROM python:3.11-slim

# Create user with specific UID
RUN useradd -m -u 1000 appuser

WORKDIR /app
COPY --chown=appuser:appuser . .

# Switch to non-root user
USER appuser

CMD ["python", "app.py"]

2. Use Official Images from Trusted Sources

3. Scan Images for Vulnerabilities

# Docker Scout (built into Docker Desktop) $ docker scout cves my-app:latest # Trivy (open-source scanner) $ trivy image my-app:latest # Snyk $ snyk container test my-app:latest

4. Minimize Attack Surface

5. Secret Management

NEVER: Put secrets in Dockerfile or commit to version control!
# ❌ BAD: Secret in Dockerfile
ENV DATABASE_PASSWORD=supersecret

# ✅ GOOD: Pass at runtime
docker run -e DATABASE_PASSWORD="${DB_PASS}" my-app

# ✅ BETTER: Use Docker secrets (Swarm) or Kubernetes secrets
docker secret create db_password ./password.txt
docker service create --secret db_password my-app

# ✅ BEST: External secret manager (AWS Secrets Manager, HashiCorp Vault)

6. Resource Limits

# Prevent container from consuming all resources $ docker run -d \ --memory="512m" \ --cpus="1.5" \ --pids-limit=100 \ my-app

7. Read-Only Filesystem

# Run with read-only root filesystem
docker run -d \
  --read-only \
  --tmpfs /tmp \
  --tmpfs /var/run \
  nginx

🚀 Image Optimization Strategies

Size Comparison Example

Technique Image Size Reduction
Original (ubuntu base, all deps) 1.2 GB -
+ Multi-stage build 450 MB 62% smaller
+ Alpine base 85 MB 81% smaller than multi-stage
+ Distroless 52 MB 39% smaller than Alpine

Layer Caching Best Practices

  1. Order matters: Put least-changing layers first
  2. Separate dependencies from code: COPY package files → RUN install → COPY code
  3. Combine related commands: Use && to chain RUN commands
  4. Clean up in same layer: Install and clean in one RUN statement

🔄 Container Runtime Comparison

Runtime Description OCI Compliant Use Case
Docker Full platform with daemon, CLI, build tools Yes Development, general use
containerd Industry-standard container runtime (Docker uses it) Yes Kubernetes default, production
CRI-O Lightweight runtime designed for Kubernetes Yes Kubernetes-only environments
Podman Daemonless alternative to Docker Yes Rootless containers, no daemon
runc Low-level runtime that actually runs containers Yes (reference implementation) Used by other runtimes
Kubernetes Deprecation Note: Kubernetes deprecated Docker as a container runtime in v1.20 (removed in v1.24). This doesn't affect Docker images - it means K8s uses containerd directly instead of Docker Engine. Docker-built images still work perfectly.

💻 Essential Docker Commands

Image Commands

# Build image $ docker build -t my-app:1.0 . $ docker build -t my-app:1.0 -f Dockerfile.prod . # List images $ docker images # Remove image $ docker rmi my-app:1.0 # Tag image $ docker tag my-app:1.0 registry.example.com/my-app:1.0 # Push to registry $ docker push registry.example.com/my-app:1.0 # Pull from registry $ docker pull nginx:1.24 # Inspect image layers $ docker history my-app:1.0

Container Commands

# Run container $ docker run -d --name web -p 80:80 nginx # Run with environment variables $ docker run -e "ENV=production" -e "DEBUG=false" my-app # List running containers $ docker ps # List all containers (including stopped) $ docker ps -a # Stop container $ docker stop web # Start stopped container $ docker start web # Restart container $ docker restart web # Remove container $ docker rm web # Force remove running container $ docker rm -f web # Execute command in running container $ docker exec -it web bash $ docker exec web ls /var/log # View logs $ docker logs web $ docker logs -f --tail 100 web # View container resource usage $ docker stats $ docker stats web # Inspect container details $ docker inspect web # Copy files to/from container $ docker cp web:/var/log/nginx/access.log ./logs/ $ docker cp config.json web:/etc/app/

System Commands

# Remove unused resources $ docker system prune # Remove everything (including volumes!) $ docker system prune -a --volumes # Show disk usage $ docker system df # Remove dangling images $ docker image prune

🎯 When to Use Containers vs VMs

Use Containers When:

Use VMs When:

Hybrid Approach:

Many production systems use both - VMs for strong isolation, containers within VMs for efficient resource usage:

🎓 Interview Questions & Answers

1. What's the difference between CMD and ENTRYPOINT?

CMD: Default command that can be overridden

ENTRYPOINT: Always executed, CMD becomes arguments to ENTRYPOINT

# CMD only - can be overridden
FROM ubuntu
CMD ["echo", "hello"]
# docker run myimage          → "hello"
# docker run myimage echo bye → "bye"

# ENTRYPOINT + CMD
FROM ubuntu
ENTRYPOINT ["echo"]
CMD ["hello"]
# docker run myimage     → "hello"
# docker run myimage bye → "bye"

2. How does Docker layer caching work?

Docker caches each layer. If a layer hasn't changed, Docker reuses the cached version. Cache is invalidated if:

  • The Dockerfile instruction changes
  • Files referenced by COPY/ADD change
  • Any parent layer changes (invalidates all subsequent layers)

Strategy: Put least-changing instructions first, most-changing last.

3. What happens when a container stops?

  • Process is sent SIGTERM (graceful shutdown)
  • After 10s (default), SIGKILL is sent (force kill)
  • Container layer (writable) still exists until removed
  • Volumes persist
  • Network connections are released

4. How do you debug a crashed container?

# View logs from stopped container
docker logs container-name

# Inspect exit code and state
docker inspect container-name | grep -A 10 State

# Start container with different command
docker run -it --entrypoint /bin/sh image-name

# Common exit codes:
# 0   - Success
# 1   - Application error
# 137 - SIGKILL (OOM killed, or manual kill -9)
# 139 - Segmentation fault
# 143 - SIGTERM (graceful shutdown)

5. What is the Docker overlay network?

Overlay networks enable containers on different Docker hosts to communicate. Used in Docker Swarm and can be used with standalone containers.

  • Uses VXLAN encapsulation
  • Requires key-value store (etcd, Consul) or Swarm mode
  • Provides service discovery and load balancing
  • Encrypts traffic between nodes (--opt encrypted)

6. Explain Docker's copy-on-write strategy

All image layers are read-only. When a container modifies a file:

  1. Docker searches for file in layers (top to bottom)
  2. File is copied to container's writable layer
  3. Modification happens in the copy
  4. Original in image layer remains unchanged

Benefit: Multiple containers can share same image layers, saving disk space.

🔗 Related Technologies

Container Orchestration

Build Tools

Registries

Key Takeaway: Docker revolutionized software deployment by providing consistent, portable, and efficient application packaging. Understanding containers is essential for modern cloud-native development and system design.
← Back to Study Guide