Docker & Container Fundamentals

📦 What Are Containers?

Containers are lightweight, standalone, executable packages that include everything needed to run a piece of software: code, runtime, system tools, libraries, and settings.

Key Insight: Containers share the host OS kernel but provide isolated user spaces. This makes them much lighter than VMs which each run a full OS.

Containers vs Virtual Machines

graph TB subgraph "Virtual Machines" VM1[App A] --> GuestOS1[Guest OS] VM2[App B] --> GuestOS2[Guest OS] GuestOS1 --> Hypervisor GuestOS2 --> Hypervisor Hypervisor --> HostOS1[Host OS] HostOS1 --> HW1[Hardware] end subgraph "Containers" C1[App A] --> ContainerEngine[Container Runtime] C2[App B] --> ContainerEngine ContainerEngine --> HostOS2[Host OS] HostOS2 --> HW2[Hardware] end

Feature	Containers	Virtual Machines
Startup Time	Seconds (or milliseconds)	Minutes
Size	MBs (lightweight)	GBs (includes full OS)
Isolation	Process-level (shared kernel)	Full OS isolation
Resource Usage	Minimal overhead	Significant overhead
Portability	Very portable (same OS kernel)	Less portable (different hypervisors)
Use Case	Microservices, dev environments	Legacy apps, different OS requirements

🔧 Linux Container Fundamentals

Containers rely on three core Linux kernel features:

1. Namespaces (Isolation)

Namespaces provide isolation for different system resources:

PID Namespace: Process isolation - container sees its own process tree
Network Namespace: Network stack isolation - separate IP addresses, routing tables
Mount Namespace: Filesystem isolation - separate mount points
UTS Namespace: Hostname and domain name isolation
IPC Namespace: Inter-process communication isolation (message queues, semaphores)
User Namespace: User/group ID isolation - root in container != root on host

2. Control Groups (cgroups) - Resource Limiting

cgroups limit and monitor resource usage:

CPU: Limit CPU usage (e.g., --cpus="1.5")
Memory: Set memory limits (e.g., --memory="512m")
Disk I/O: Limit read/write operations
Network: Bandwidth limiting

3. Union Filesystems (Layer Management)

Union filesystems (OverlayFS, AUFS) allow containers to share base image layers:

Each Dockerfile instruction creates a new layer
Layers are read-only and cached
Only the top layer is writable (container layer)
Dramatically reduces storage and improves build times

graph BT Container[Container Layer - Writable] Layer3[nginx config - Layer 3] Layer2[nginx install - Layer 2] Layer1[apt update - Layer 1] Base[Base Image: ubuntu:22.04] Container --> Layer3 Layer3 --> Layer2 Layer2 --> Layer1 Layer1 --> Base style Container fill:#90EE90,color:#2e3440 style Base fill:#ADD8E6,color:#2e3440

🏗️ Docker Architecture

graph LR Client[Docker Client
docker build
docker run
docker push] Client -->|REST API| Daemon[Docker Daemon
dockerd] Daemon --> Images[Images] Daemon --> Containers[Containers] Daemon --> Networks[Networks] Daemon --> Volumes[Volumes] Daemon -->|pull/push| Registry[Docker Registry
Docker Hub] style Client fill:#4A90E2,color:#2e3440 style Daemon fill:#E74C3C,color:#2e3440 style Registry fill:#F39C12,color:#2e3440

Key Components

Docker Client: CLI tool that sends commands to the daemon
Docker Daemon (dockerd): Background service that manages images, containers, networks, and volumes
Docker Images: Read-only templates with instructions for creating containers
Docker Containers: Runnable instances of images
Docker Registry: Stores Docker images (Docker Hub, AWS ECR, Google GCR)

📝 Dockerfile Best Practices

Basic Dockerfile Example

# Use official base image
FROM python:3.11-slim

# Set working directory
WORKDIR /app

# Copy requirements first (layer caching!)
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create non-root user for security
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:8000/health || exit 1

# Run application
CMD ["python", "app.py"]

Multi-Stage Builds (Critical for Production)

Multi-stage builds dramatically reduce image size by separating build and runtime environments:

# Stage 1: Build
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build

# Stage 2: Production runtime
FROM node:18-alpine
WORKDIR /app

# Copy only necessary files from builder
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Run as non-root
USER node

EXPOSE 3000
CMD ["node", "dist/server.js"]

Result: Multi-stage build reduced image size from 1.2GB to 150MB! Build tools (compilers, dev dependencies) are not included in final image.

Dockerfile Optimization Techniques

1. Layer Caching Strategy

# ❌ BAD: Changes to code invalidate dependency layer
FROM python:3.11-slim
COPY . .
RUN pip install -r requirements.txt

# ✅ GOOD: Dependencies cached unless requirements.txt changes
FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .

2. Minimize Layer Count

# ❌ BAD: Each RUN creates a new layer
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN rm -rf /var/lib/apt/lists/*

# ✅ GOOD: Combine into single layer
RUN apt-get update && \
    apt-get install -y curl git && \
    rm -rf /var/lib/apt/lists/*

3. Use .dockerignore

# .dockerignore
node_modules
.git
.env
*.log
.vscode
__pycache__
*.pyc
.pytest_cache
dist
build

4. Choose Minimal Base Images

Base Image	Size	Use Case
ubuntu:22.04	77MB	Full-featured, debugging tools included
python:3.11-slim	125MB	Minimal Python with basic tools
python:3.11-alpine	50MB	Ultra-minimal (musl libc - compatibility issues possible)
distroless/python3	53MB	No shell, no package manager - maximum security

🌐 Docker Networking

Network Drivers

Driver	Description	Use Case
bridge	Default network. Containers on same bridge can communicate	Single-host container communication
host	Remove network isolation - container uses host's network	High performance, no port mapping overhead
overlay	Multi-host networking for Docker Swarm	Distributed applications across multiple hosts
none	No networking - complete isolation	Maximum security, offline processing
macvlan	Assign MAC address - container appears as physical device	Legacy applications expecting direct network access

Networking Commands

$ docker network create my-app-network $ docker network ls $ docker network inspect my-app-network # Run container on custom network $ docker run -d --name web --network my-app-network nginx # Connect running container to network $ docker network connect my-app-network redis

graph TB subgraph "Docker Host" subgraph "Bridge Network: my-app-network" Web[web container
nginx] API[api container
flask] DB[db container
postgres] end Web -->|internal DNS
http://api:5000| API API -->|postgres://db:5432| DB Bridge[Docker Bridge
172.18.0.1] Web -.-> Bridge API -.-> Bridge DB -.-> Bridge end Bridge -->|Port mapping
80:80| External[External Traffic] style External fill:#E74C3C,color:#2e3440

💾 Docker Storage & Volumes

Three Types of Storage

Type	Description	Persistence	Use Case
Volumes	Managed by Docker, stored in /var/lib/docker/volumes/	Persists beyond container lifecycle	Database data, application state
Bind Mounts	Mount any host path into container	Persists on host filesystem	Development (mount source code), configs
tmpfs	Stored in host memory only	Lost when container stops	Sensitive data, temporary processing

Volume Examples

# Create named volume $ docker volume create postgres-data # Run container with volume $ docker run -d \ --name postgres \ -v postgres-data:/var/lib/postgresql/data \ postgres:15 # Bind mount (development) $ docker run -d \ --name dev-app \ -v $(pwd):/app \ -v /app/node_modules \ node:18 # tmpfs mount (in-memory) $ docker run -d \ --name cache \ --tmpfs /tmp:rw,size=512m,mode=1777 \ redis:7

🎼 Docker Compose

Docker Compose manages multi-container applications with a single YAML file.

Complete Example: Web Application Stack

# docker-compose.yml
version: '3.8'

services:
  # Nginx reverse proxy
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - api
    networks:
      - frontend
    restart: unless-stopped

  # Python API service
  api:
    build:
      context: ./api
      dockerfile: Dockerfile
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/myapp
      REDIS_URL: redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
      cache:
        condition: service_started
    networks:
      - frontend
      - backend
    restart: unless-stopped

  # PostgreSQL database
  db:
    image: postgres:15
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: myapp
    volumes:
      - postgres-data:/var/lib/postgresql/data
    networks:
      - backend
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 10s
      timeout: 5s
      retries: 5
    restart: unless-stopped

  # Redis cache
  cache:
    image: redis:7-alpine
    networks:
      - backend
    restart: unless-stopped

networks:
  frontend:
  backend:

volumes:
  postgres-data:

Docker Compose Commands

# Start all services $ docker-compose up -d # View logs $ docker-compose logs -f api # Scale services $ docker-compose up -d --scale api=3 # Stop and remove everything $ docker-compose down # Stop and remove including volumes $ docker-compose down -v

🔒 Security Best Practices

1. Run as Non-Root User

FROM python:3.11-slim

# Create user with specific UID
RUN useradd -m -u 1000 appuser

WORKDIR /app
COPY --chown=appuser:appuser . .

# Switch to non-root user
USER appuser

CMD ["python", "app.py"]

2. Use Official Images from Trusted Sources

Prefer official images: python:3.11 not random-user/python
Pin specific versions: nginx:1.24.0 not nginx:latest
Verify image digests for supply chain security

3. Scan Images for Vulnerabilities

# Docker Scout (built into Docker Desktop) $ docker scout cves my-app:latest # Trivy (open-source scanner) $ trivy image my-app:latest # Snyk $ snyk container test my-app:latest

4. Minimize Attack Surface

Use minimal base images (Alpine, Distroless)
Multi-stage builds to exclude build tools
Don't install unnecessary packages
Remove package manager caches

5. Secret Management

NEVER: Put secrets in Dockerfile or commit to version control!

# ❌ BAD: Secret in Dockerfile
ENV DATABASE_PASSWORD=supersecret

# ✅ GOOD: Pass at runtime
docker run -e DATABASE_PASSWORD="${DB_PASS}" my-app

# ✅ BETTER: Use Docker secrets (Swarm) or Kubernetes secrets
docker secret create db_password ./password.txt
docker service create --secret db_password my-app

# ✅ BEST: External secret manager (AWS Secrets Manager, HashiCorp Vault)

6. Resource Limits

# Prevent container from consuming all resources $ docker run -d \ --memory="512m" \ --cpus="1.5" \ --pids-limit=100 \ my-app

7. Read-Only Filesystem

# Run with read-only root filesystem
docker run -d \
  --read-only \
  --tmpfs /tmp \
  --tmpfs /var/run \
  nginx

🚀 Image Optimization Strategies

Size Comparison Example

Technique	Image Size	Reduction
Original (ubuntu base, all deps)	1.2 GB	-
+ Multi-stage build	450 MB	62% smaller
+ Alpine base	85 MB	81% smaller than multi-stage
+ Distroless	52 MB	39% smaller than Alpine

Layer Caching Best Practices

Order matters: Put least-changing layers first
Separate dependencies from code: COPY package files → RUN install → COPY code
Combine related commands: Use && to chain RUN commands
Clean up in same layer: Install and clean in one RUN statement

🔄 Container Runtime Comparison

Runtime	Description	OCI Compliant	Use Case
Docker	Full platform with daemon, CLI, build tools	Yes	Development, general use
containerd	Industry-standard container runtime (Docker uses it)	Yes	Kubernetes default, production
CRI-O	Lightweight runtime designed for Kubernetes	Yes	Kubernetes-only environments
Podman	Daemonless alternative to Docker	Yes	Rootless containers, no daemon
runc	Low-level runtime that actually runs containers	Yes (reference implementation)	Used by other runtimes

Kubernetes Deprecation Note: Kubernetes deprecated Docker as a container runtime in v1.20 (removed in v1.24). This doesn't affect Docker images - it means K8s uses containerd directly instead of Docker Engine. Docker-built images still work perfectly.

💻 Essential Docker Commands

Image Commands

# Build image $ docker build -t my-app:1.0 . $ docker build -t my-app:1.0 -f Dockerfile.prod . # List images $ docker images # Remove image $ docker rmi my-app:1.0 # Tag image $ docker tag my-app:1.0 registry.example.com/my-app:1.0 # Push to registry $ docker push registry.example.com/my-app:1.0 # Pull from registry $ docker pull nginx:1.24 # Inspect image layers $ docker history my-app:1.0

Container Commands

# Run container $ docker run -d --name web -p 80:80 nginx # Run with environment variables $ docker run -e "ENV=production" -e "DEBUG=false" my-app # List running containers $ docker ps # List all containers (including stopped) $ docker ps -a # Stop container $ docker stop web # Start stopped container $ docker start web # Restart container $ docker restart web # Remove container $ docker rm web # Force remove running container $ docker rm -f web # Execute command in running container $ docker exec -it web bash $ docker exec web ls /var/log # View logs $ docker logs web $ docker logs -f --tail 100 web # View container resource usage $ docker stats $ docker stats web # Inspect container details $ docker inspect web # Copy files to/from container $ docker cp web:/var/log/nginx/access.log ./logs/ $ docker cp config.json web:/etc/app/

System Commands

# Remove unused resources $ docker system prune # Remove everything (including volumes!) $ docker system prune -a --volumes # Show disk usage $ docker system df # Remove dangling images $ docker image prune

🎯 When to Use Containers vs VMs

Use Containers When:

Building microservices architectures
Need rapid deployment and scaling
Running multiple instances of same application
Development environment consistency (dev/prod parity)
CI/CD pipelines
Running Linux applications on Linux host
Want minimal resource overhead

Use VMs When:

Need different OS than host (Windows on Linux)
Require strong isolation for security/compliance
Running legacy monolithic applications
Need full OS-level isolation
Kernel-level operations required
Long-running stateful applications with complex dependencies

Hybrid Approach:

Many production systems use both - VMs for strong isolation, containers within VMs for efficient resource usage:

AWS ECS/EKS: Containers run on EC2 VMs
Google GKE: Kubernetes nodes are VMs running containers
Multi-tenant environments: VMs per tenant, containers per service

🎓 Interview Questions & Answers

1. What's the difference between CMD and ENTRYPOINT?

CMD: Default command that can be overridden

ENTRYPOINT: Always executed, CMD becomes arguments to ENTRYPOINT

# CMD only - can be overridden
FROM ubuntu
CMD ["echo", "hello"]
# docker run myimage          → "hello"
# docker run myimage echo bye → "bye"

# ENTRYPOINT + CMD
FROM ubuntu
ENTRYPOINT ["echo"]
CMD ["hello"]
# docker run myimage     → "hello"
# docker run myimage bye → "bye"

2. How does Docker layer caching work?

Docker caches each layer. If a layer hasn't changed, Docker reuses the cached version. Cache is invalidated if:

The Dockerfile instruction changes
Files referenced by COPY/ADD change
Any parent layer changes (invalidates all subsequent layers)

Strategy: Put least-changing instructions first, most-changing last.

3. What happens when a container stops?

Process is sent SIGTERM (graceful shutdown)
After 10s (default), SIGKILL is sent (force kill)
Container layer (writable) still exists until removed
Volumes persist
Network connections are released

4. How do you debug a crashed container?

# View logs from stopped container
docker logs container-name

# Inspect exit code and state
docker inspect container-name | grep -A 10 State

# Start container with different command
docker run -it --entrypoint /bin/sh image-name

# Common exit codes:
# 0   - Success
# 1   - Application error
# 137 - SIGKILL (OOM killed, or manual kill -9)
# 139 - Segmentation fault
# 143 - SIGTERM (graceful shutdown)

5. What is the Docker overlay network?

Overlay networks enable containers on different Docker hosts to communicate. Used in Docker Swarm and can be used with standalone containers.

Uses VXLAN encapsulation
Requires key-value store (etcd, Consul) or Swarm mode
Provides service discovery and load balancing
Encrypts traffic between nodes (--opt encrypted)

6. Explain Docker's copy-on-write strategy

All image layers are read-only. When a container modifies a file:

Docker searches for file in layers (top to bottom)
File is copied to container's writable layer
Modification happens in the copy
Original in image layer remains unchanged

Benefit: Multiple containers can share same image layers, saving disk space.

🔗 Related Technologies

Container Orchestration

Kubernetes: Industry-standard container orchestration (covered in separate guide)
Docker Swarm: Docker's native clustering solution (simpler than K8s)
Amazon ECS: AWS container orchestration service
Nomad: HashiCorp's orchestration tool

Build Tools

BuildKit: Next-gen Docker build engine (parallel builds, caching improvements)
Kaniko: Build images in Kubernetes without Docker daemon
Buildah: Build OCI images without Docker

Registries

Docker Hub: Public registry
AWS ECR: Amazon Elastic Container Registry
Google GCR: Google Container Registry
Harbor: Open-source enterprise registry with security scanning
JFrog Artifactory: Universal artifact repository

                Key Takeaway: Docker revolutionized software deployment by providing consistent, portable, and efficient application packaging. Understanding containers is essential for modern cloud-native development and system design.
            

← Back to Study Guide