Distinguished Engineer Study Guide - From Boot to Production
Beyond Basic Commands: Distinguished engineers need to understand Linux at the systems level - not just running commands, but understanding what happens under the hood.
Interview Focus:
Power On → BIOS/UEFI Firmware
1. POST (Power-On Self-Test)
- Check CPU, RAM, peripherals
- Initialize hardware
2. Find boot device (hard drive, USB, network)
3. Read MBR (Master Boot Record) - first 512 bytes of disk
- Or ESP (EFI System Partition) for UEFI
4. Load bootloader into memory
5. Transfer control to bootloader
GRUB (Grand Unified Bootloader)
1. Display boot menu (or auto-boot after timeout)
2. Load kernel image (/boot/vmlinuz-*)
3. Load initial RAM disk (initrd/initramfs)
4. Pass kernel parameters (from /etc/default/grub)
5. Jump to kernel entry point
Example GRUB entry:
linux /boot/vmlinuz-5.15.0 root=UUID=xxx ro quiet
initrd /boot/initramfs-5.15.0.img
Parameters:
- root=: Root filesystem device
- ro: Mount root read-only initially (fsck can run)
- quiet: Suppress verbose messages
- single: Boot to single-user mode (rescue)
Kernel Boot Process:
1. Decompress kernel (if compressed)
2. Initialize kernel subsystems:
- Memory management (MMU, paging)
- Process scheduler
- Device drivers
3. Mount initramfs (initial RAM filesystem)
- Temporary root filesystem in RAM
- Contains drivers needed to mount real root
- Loads modules for disk controllers, RAID, LVM
4. Execute /init script in initramfs
5. Mount real root filesystem (from root= parameter)
6. Pivot to real root (switch_root)
7. Execute init system (/sbin/init → systemd)
View kernel boot messages:
$ dmesg | less
$ journalctl -k (kernel messages via systemd)
systemd becomes PID 1
1. Read configuration from /etc/systemd/
2. Determine target (runlevel equivalent)
- multi-user.target (CLI, no GUI)
- graphical.target (GUI)
3. Start services in dependency order
- Parallel initialization where possible
4. Mount filesystems from /etc/fstab
5. Start getty (login prompts)
Check boot time:
$ systemd-analyze
Startup finished in 2.5s (kernel) + 8.3s (userspace) = 10.8s
Check service startup times:
$ systemd-analyze blame
| Target | Old Runlevel | Description |
|---|---|---|
| poweroff.target | 0 | Shutdown system |
| rescue.target | 1, s, single | Single-user mode (root only, minimal services) |
| multi-user.target | 3 | Multi-user text mode (no GUI) |
| graphical.target | 5 | Multi-user with GUI |
| reboot.target | 6 | Reboot system |
Check current target:
$ systemctl get-default
graphical.target
Change default target:
$ sudo systemctl set-default multi-user.target
Boot to rescue mode (from GRUB):
- Edit GRUB entry (press 'e')
- Add "systemd.unit=rescue.target" to kernel line
- Boot with Ctrl+X
Process States:
┌─────────┐ fork() ┌─────────┐ exec() ┌─────────┐
│ Parent │ ──────────> │ Child │ ──────────> │ Running │
│ Process │ │ (copy) │ │ Program │
└─────────┘ └─────────┘ └─────────┘
│
├──> Running (on CPU)
├──> Runnable (waiting for CPU)
├──> Sleeping (waiting for I/O)
├──> Stopped (Ctrl+Z)
└──> Zombie (exited, parent hasn't reaped)
Process Creation:
1. fork() - Create copy of current process
- Child gets copy of parent's memory, file descriptors
- Child has new PID
- Both processes continue from fork() return point
2. exec() - Replace process image with new program
- Load new executable
- Replace memory space
- Keep PID, file descriptors (unless close-on-exec)
View processes:
$ ps aux # All processes, user-oriented
$ ps -ef # All processes, full format
$ pstree # Process tree
Key fields:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 169104 13140 ? Ss Jan15 0:12 /sbin/init
user 1234 2.5 10.2 2456780 1024000 ? Sl 10:30 5:23 /usr/bin/app
VSZ: Virtual memory size (total addressable memory)
RSS: Resident Set Size (physical RAM actually used)
STAT: Process state
S: Sleeping (interruptible)
R: Running
D: Sleeping (uninterruptible - usually I/O wait)
T: Stopped
Z: Zombie
<: High priority
N: Low priority
s: Session leader
View specific process:
$ ps -p 1234 -f
$ cat /proc/1234/status # Detailed process info
$ cat /proc/1234/cmdline # Command line
$ ls -l /proc/1234/fd # Open file descriptors
Process Priority:
- Linux uses priority range: 0-139
- Real-time: 0-99 (higher = higher priority)
- Normal: 100-139 (lower = higher priority)
- Nice values: -20 to +19 (user-adjustable)
- Nice -20 = highest priority (100)
- Nice 0 = default (120)
- Nice 19 = lowest priority (139)
Adjust priority:
$ nice -n 10 ./my-program # Start with lower priority
$ renice -n 5 -p 1234 # Change running process
$ renice -n -5 -u username # Requires root for negative nice
View priorities:
$ ps -eo pid,ni,pri,comm
PID NI PRI COMMAND
1234 0 19 my-program
5678 -10 9 important-task
Definition: A system call is the interface between user-space programs and the kernel. It's how applications request services from the operating system.
Security & Isolation: User programs can't directly access hardware or kernel memory. System calls provide a controlled, safe way to perform privileged operations.
User Space vs Kernel Space:
┌─────────────────────────────────────┐
│ User Space (Ring 3) │ ← Applications run here
│ - Limited privileges │
│ - Cannot access hardware directly │
│ - Cannot access kernel memory │
├─────────────────────────────────────┤
│ System Call Interface │ ← Syscall boundary
├─────────────────────────────────────┤
│ Kernel Space (Ring 0) │ ← Kernel runs here
│ - Full privileges │
│ - Direct hardware access │
│ - Manage all resources │
└─────────────────────────────────────┘
Example: Reading a file with read()
1. Application calls read(fd, buffer, size)
↓
2. C library (libc) prepares syscall
- Put syscall number in register (RAX on x86-64)
- Put arguments in registers (RDI, RSI, RDX, ...)
↓
3. Execute syscall instruction (formerly int 0x80)
- CPU switches from Ring 3 (user) to Ring 0 (kernel)
- Save user context (registers, stack)
- Jump to kernel syscall handler
↓
4. Kernel executes syscall
- Validate arguments (is fd valid? is buffer in user space?)
- Perform operation (read from filesystem)
- Prepare return value
↓
5. Return to user space
- Restore user context
- Switch from Ring 0 to Ring 3
- Return value in RAX register
↓
6. C library returns to application
Cost: Context switch + validation ~100-300 ns (expensive!)
| Category | System Calls | Purpose |
|---|---|---|
| Process Control | fork, exec, exit, wait, kill | Create, terminate, manage processes |
| File Operations | open, close, read, write, lseek, stat | File I/O and metadata |
| Directory Operations | mkdir, rmdir, chdir, opendir, readdir | Directory management |
| Device Management | ioctl, read, write | Device I/O control |
| Memory Management | brk, mmap, munmap, mprotect | Allocate, map, protect memory |
| Communication | pipe, socket, send, recv, shmget | IPC (inter-process communication) |
| Protection | chmod, chown, setuid, setgid | Permissions and ownership |
| Time | time, gettimeofday, clock_gettime | Get system time |
strace - Trace system calls and signals
$ strace ls
execve("/bin/ls", ["ls"], 0x7ffd...) = 0
brk(NULL) = 0x55555576f000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, ...) = 0x7f1234567000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY...) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=123456, ...}) = 0
mmap(NULL, 123456, PROT_READ, ...) = 0x7f1234560000
close(3) = 0
...
write(1, "file1.txt\nfile2.txt\n", 20) = 20
close(1) = 0
close(2) = 0
exit_group(0) = ?
Useful strace options:
$ strace -c ls # Count syscalls
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.00 0.000100 5 20 write
20.00 0.000080 4 20 read
15.00 0.000060 3 20 open
...
$ strace -e open,read ls # Trace only specific syscalls
$ strace -p 1234 # Attach to running process
$ strace -f ./program # Follow forks
$ strace -T ls # Show time spent in each syscall
$ strace -o output.txt ls # Save to file
Performance analysis:
$ strace -c -p 1234 # Count syscalls for running process
Problem: System calls are expensive (context switch overhead)
Bad: Many small syscalls
for i in range(1000000):
write(fd, "x", 1) # 1 million syscalls!
# ~100ns each = 100ms total in syscall overhead
Good: Batch operations
buffer = "x" * 1000000
write(fd, buffer, 1000000) # 1 syscall!
# ~100ns syscall overhead
Optimization strategies:
- Buffered I/O (fwrite, fread vs write, read)
- Memory mapping (mmap) instead of read/write
- Batch syscalls where possible
- Use async I/O (io_uring on modern Linux)
Example - Reading file:
Syscall per byte: 1MB / 1 byte = 1,000,000 syscalls → SLOW
Syscall per 4KB: 1MB / 4KB = 256 syscalls → Better
mmap entire file: 1 syscall → Fast
vDSO (virtual Dynamic Shared Object):
- Kernel maps read-only page into every process
- Contains fast-path syscalls (no context switch!)
- Used for: gettimeofday(), clock_gettime(), getcpu()
Without vDSO:
$ time ./get-time-1million # 1M calls to gettimeofday()
real 0m5.432s # 5.4 µs per call (context switch)
With vDSO:
$ time ./get-time-1million
real 0m0.045s # 45 ns per call (no context switch!)
120x faster! No kernel transition needed.
| Signal | Number | Default Action | Description |
|---|---|---|---|
| SIGHUP | 1 | Terminate | Hangup - terminal disconnected (often: reload config) |
| SIGINT | 2 | Terminate | Interrupt (Ctrl+C) |
| SIGQUIT | 3 | Core dump | Quit (Ctrl+\) |
| SIGKILL | 9 | Terminate | Kill immediately (cannot be caught/ignored) |
| SIGTERM | 15 | Terminate | Graceful termination (default kill signal) |
| SIGSTOP | 19 | Stop | Pause process (cannot be caught) |
| SIGCONT | 18 | Continue | Resume stopped process |
| SIGUSR1/2 | 10/12 | Terminate | User-defined signals |
Send signals:
$ kill 1234 # Send SIGTERM (15) - graceful
$ kill -9 1234 # Send SIGKILL - immediate
$ kill -HUP 1234 # Send SIGHUP - reload config
$ killall nginx # Kill all nginx processes
$ pkill -f "python.*app" # Kill by pattern
Background jobs:
$ ./long-task & # Run in background
[1] 5678
$ jobs # List background jobs
[1]+ Running ./long-task &
$ fg %1 # Bring to foreground
$ bg %1 # Continue in background
$ Ctrl+Z # Suspend current job (SIGSTOP)
$ disown %1 # Detach from shell (won't die on logout)
Keep running after logout:
$ nohup ./long-task & # Ignore SIGHUP, redirect output to nohup.out
$ screen ./long-task # Use screen session
$ tmux # Use tmux session
Virtual Memory Abstraction:
┌─────────────────────────────┐
│ Process Virtual Address │ (e.g., 4GB on 32-bit, 128TB on 64-bit)
│ Space (per process) │
├─────────────────────────────┤
│ 0xFFFFFFFF Kernel Space │ ← Shared by all processes
│ (1GB typical) │
├─────────────────────────────┤
│ Stack │ ← Grows down
│ ↓ │
│ │
│ Heap │ ← Grows up (malloc/new)
│ ↑ │
│ BSS (uninit) │
│ Data (init) │
│ 0x00000000 Text (code) │
└─────────────────────────────┘
│
│ Page Table (MMU translation)
↓
┌─────────────────────────────┐
│ Physical RAM │ (e.g., 16GB actual RAM)
│ │
│ ┌──────┬──────┬──────┐ │
│ │ Page │ Page │ Page │ ... │ (4KB pages)
│ └──────┴──────┴──────┘ │
└─────────────────────────────┘
Why Virtual Memory?
1. Isolation: Processes can't access each other's memory
2. Simplicity: Each process sees full address space
3. Flexibility: Process can use more memory than physical RAM (swap)
4. Sharing: Multiple processes can share code pages (libc.so)
Page Size: Typically 4KB (configurable: 2MB "huge pages")
Page Table Entry (PTE):
┌──────────────────────────────────────┐
│ Physical Frame # │ Flags │
│ (20 bits) │ Present/Valid │
│ │ Read/Write │
│ │ User/Supervisor │
│ │ Accessed │
│ │ Dirty (modified) │
└──────────────────────────────────────┘
Page Fault:
1. Process accesses virtual address
2. MMU checks page table
3. Page not present → Page Fault (trap to kernel)
4. Types:
- Minor: Page in memory but not mapped (just update table)
- Major: Page on disk (swap) - must read from disk
- Segfault: Invalid access (kill process)
View page faults:
$ ps -o min_flt,maj_flt,cmd 1234
MINFL MAJFL CMD
12345 234 /usr/bin/myapp
Linux allows overcommit by default:
- Processes can allocate more memory than physically available
- Kernel assumes not all allocated memory will be used
- On actual use (page fault), allocate physical page
Overcommit modes (/proc/sys/vm/overcommit_memory):
0: Heuristic (default) - reasonable overcommit allowed
1: Always - no limit (dangerous!)
2: Never - strict accounting (total commits < swap + RAM * ratio)
$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio # Percentage of RAM
50
Check committed memory:
$ cat /proc/meminfo | grep Commit
CommitLimit: 16000000 kB
Committed_AS: 12000000 kB
Scenario: System runs out of memory. Kernel can't allocate pages. What happens?
OOM Killer: Kernel selects a process to kill to free memory. Prevents total system freeze.
OOM Killer Process:
1. System exhausts physical RAM + swap
2. Kernel can't satisfy memory allocation
3. OOM killer activated
4. Calculate OOM score for each process:
- Based on: memory usage, runtime, process priority
- Root/system processes have lower scores
5. Kill process with highest score
6. Log to /var/log/messages or dmesg
OOM Score:
$ cat /proc/1234/oom_score # Current score (0-1000)
$ cat /proc/1234/oom_score_adj # Adjustment (-1000 to 1000)
Adjust OOM behavior:
$ echo -1000 > /proc/1234/oom_score_adj # Never kill (reserved for critical)
$ echo 1000 > /proc/1234/oom_score_adj # Kill first
Disable OOM killer (dangerous!):
$ echo 2 > /proc/sys/vm/panic_on_oom # Kernel panic instead
View OOM events:
$ dmesg | grep -i oom
$ journalctl -k | grep -i "killed process"
$ free -h
total used free shared buff/cache available
Mem: 15G 8.0G 2.0G 100M 5.0G 6.5G
Swap: 8.0G 1.0G 7.0G
Fields explained:
- total: Total installed RAM
- used: RAM used by processes
- free: Completely unused RAM
- shared: Memory used by tmpfs (e.g., /dev/shm)
- buff/cache: Disk cache (will be freed if needed)
- available: Memory available for new processes (free + reclaimable cache)
Important: Linux uses "free" memory for caching!
"buff/cache" will be freed automatically if processes need it.
Look at "available" not "free" for actual free memory.
Detailed memory info:
$ cat /proc/meminfo
MemTotal: 16384000 kB
MemFree: 2048000 kB
Buffers: 512000 kB
Cached: 4096000 kB
SwapTotal: 8192000 kB
SwapFree: 7168000 kB
Dirty: 12000 kB ← Dirty pages waiting to flush to disk
Writeback: 0 kB ← Pages actively being written
AnonPages: 6144000 kB ← Private process memory
Shmem: 102400 kB ← Shared memory
Per-process memory:
$ pmap 1234 # Memory map of process
$ cat /proc/1234/smaps # Detailed memory map
$ cat /proc/1234/status | grep -i mem
VmSize: 2456780 kB ← Virtual memory
VmRSS: 1024000 kB ← Resident (physical RAM)
VmData: 512000 kB ← Data segment
VmStk: 136 kB ← Stack
VmExe: 4096 kB ← Code
What is Swap?
- Disk space used as "overflow" for RAM
- Inactive pages moved to swap when RAM pressure high
- Much slower than RAM (100x-1000x slower)
View swap:
$ swapon --show
NAME TYPE SIZE USED PRIO
/swapfile file 8G 1.2G -2
Add swap:
$ sudo dd if=/dev/zero of=/swapfile bs=1M count=8192 # 8GB
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile
Swappiness (0-100):
- Controls how aggressively kernel swaps
- 0: Swap only to avoid OOM
- 60: Default (balanced)
- 100: Aggressive swapping
$ cat /proc/sys/vm/swappiness
60
$ sudo sysctl vm.swappiness=10 # Less aggressive
| Feature | MBR (Master Boot Record) | GPT (GUID Partition Table) |
|---|---|---|
| Max Disk Size | 2 TB | 9.4 ZB (zettabytes) |
| Max Partitions | 4 primary (or 3 + extended with logical) | 128 (default, can be more) |
| Boot Mode | BIOS | UEFI (also BIOS with protective MBR) |
| Redundancy | None (single point of failure) | Backup partition table at end of disk |
List disks and partitions:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 500G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
├─sda2 8:2 0 100G 0 part /
└─sda3 8:3 0 399G 0 part /home
$ fdisk -l /dev/sda
Disk /dev/sda: 500 GB
Disk identifier: 0x12345678
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 1050623 1048576 512M ef EFI System
/dev/sda2 1050624 210765823 209715200 100G 83 Linux
/dev/sda3 210765824 976773167 765997344 365G 83 Linux
Partition tools:
- fdisk: MBR partitions (interactive)
- gdisk: GPT partitions (interactive)
- parted: Both MBR/GPT (scriptable)
Create partition with parted:
$ sudo parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart primary ext4 0% 50%
(parted) mkpart primary ext4 50% 100%
(parted) print
(parted) quit
Format partition:
$ sudo mkfs.ext4 /dev/sdb1
$ sudo mkfs.xfs /dev/sdb2
LVM Layers:
┌─────────────────────────────────────────┐
│ Filesystems (ext4, xfs, etc.) │
├─────────────────────────────────────────┤
│ Logical Volumes (LVs) │
│ /dev/vg01/lv_root /dev/vg01/lv_home │ ← Flexible, resizable
├─────────────────────────────────────────┤
│ Volume Group (VG) │
│ vg01 (pool of storage) │ ← Aggregates PVs
├─────────────────────────────────────────┤
│ Physical Volumes (PVs) │
│ /dev/sda2 /dev/sdb1 /dev/sdc1 │ ← Partitions or disks
└─────────────────────────────────────────┘
Why LVM?
- Resize volumes without downtime
- Snapshots (for backups)
- Combine multiple disks into one volume
- Move data between disks while online
Create LVM setup:
1. Create Physical Volumes:
$ sudo pvcreate /dev/sdb1 /dev/sdc1
$ sudo pvdisplay
2. Create Volume Group:
$ sudo vgcreate vg01 /dev/sdb1 /dev/sdc1
$ sudo vgdisplay vg01
3. Create Logical Volumes:
$ sudo lvcreate -L 50G -n lv_root vg01 # Fixed size
$ sudo lvcreate -l 100%FREE -n lv_home vg01 # Use remaining space
$ sudo lvdisplay
4. Format and mount:
$ sudo mkfs.ext4 /dev/vg01/lv_root
$ sudo mount /dev/vg01/lv_root /mnt
Resize LV (extend):
$ sudo lvextend -L +20G /dev/vg01/lv_root # Add 20GB
$ sudo resize2fs /dev/vg01/lv_root # Extend filesystem (ext4)
$ sudo xfs_growfs /mnt # Extend filesystem (xfs)
Resize LV (shrink - ext4 only, requires unmount):
$ sudo umount /dev/vg01/lv_root
$ sudo e2fsck -f /dev/vg01/lv_root
$ sudo resize2fs /dev/vg01/lv_root 30G
$ sudo lvreduce -L 30G /dev/vg01/lv_root
LVM Snapshots:
$ sudo lvcreate -L 10G -s -n lv_root_snap /dev/vg01/lv_root
$ sudo mount /dev/vg01/lv_root_snap /mnt/snapshot
# Make changes to original...
# Restore from snapshot if needed:
$ sudo lvconvert --merge /dev/vg01/lv_root_snap
| Level | Min Disks | Usable Space | Fault Tolerance | Use Case |
|---|---|---|---|---|
| RAID 0 | 2 | 100% (all disks) | None - any disk fails = data loss | Performance (striping), not production |
| RAID 1 | 2 | 50% (mirror) | 1 disk can fail | Redundancy, small datasets |
| RAID 5 | 3 | (N-1) disks | 1 disk can fail | Balanced (performance + redundancy) |
| RAID 6 | 4 | (N-2) disks | 2 disks can fail | High redundancy, large arrays |
| RAID 10 | 4 | 50% | 1 disk per mirror pair | High performance + redundancy |
Linux Software RAID (mdadm):
Create RAID 1:
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1
$ cat /proc/mdstat
md0 : active raid1 sdc1[1] sdb1[0]
524224 blocks super 1.2 [2/2] [UU]
Create RAID 5:
$ sudo mdadm --create /dev/md1 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1
Check status:
$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Jan 15 10:00:00 2025
Raid Level : raid1
Array Size : 524224 (511.88 MiB 536.74 MB)
Device Size : 524224 (511.88 MiB 536.74 MB)
Raid Devices : 2
Total Devices : 2
State : clean
Replace failed disk:
$ sudo mdadm --manage /dev/md0 --fail /dev/sdb1
$ sudo mdadm --manage /dev/md0 --remove /dev/sdb1
$ sudo mdadm --manage /dev/md0 --add /dev/sde1 # New disk
# Watch rebuild:
$ watch cat /proc/mdstat
List interfaces:
$ ip link show
1: lo: mtu 65536
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: mtu 1500
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
3: wlan0: mtu 1500
link/ether 00:1a:2b:3c:4d:5e brd ff:ff:ff:ff:ff:ff
Show IP addresses:
$ ip addr show
$ ip addr show eth0
2: eth0: mtu 1500
inet 192.168.1.100/24 brd 192.168.1.255 scope global eth0
inet6 fe80::5054:ff:fe12:3456/64 scope link
Configure interface:
$ sudo ip addr add 192.168.1.100/24 dev eth0
$ sudo ip link set eth0 up
$ sudo ip link set eth0 down
Persistent configuration (Ubuntu/Debian - netplan):
$ sudo vi /etc/netplan/01-netcfg.yaml
network:
version: 2
ethernets:
eth0:
addresses:
- 192.168.1.100/24
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8, 1.1.1.1]
$ sudo netplan apply
Persistent (RHEL/CentOS - NetworkManager):
$ sudo nmcli con add type ethernet ifname eth0 con-name eth0
$ sudo nmcli con mod eth0 ipv4.addresses 192.168.1.100/24
$ sudo nmcli con mod eth0 ipv4.gateway 192.168.1.1
$ sudo nmcli con mod eth0 ipv4.dns "8.8.8.8 1.1.1.1"
$ sudo nmcli con mod eth0 ipv4.method manual
$ sudo nmcli con up eth0
View routing table:
$ ip route show
default via 192.168.1.1 dev eth0 proto static
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.100
$ route -n (legacy)
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
Add route:
$ sudo ip route add 10.0.0.0/8 via 192.168.1.254
$ sudo ip route add default via 192.168.1.1
Delete route:
$ sudo ip route del 10.0.0.0/8
Trace route:
$ traceroute google.com
$ mtr google.com # Continuous traceroute
Tables and Chains:
filter table (default):
- INPUT: Packets destined for local system
- OUTPUT: Packets originating from local system
- FORWARD: Packets routed through system
nat table:
- PREROUTING: Alter packets before routing
- POSTROUTING: Alter packets after routing (SNAT/MASQUERADE)
- OUTPUT: NAT for locally generated packets
List rules:
$ sudo iptables -L -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
123 9876 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:22
Common rules:
# Allow SSH
$ sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT
# Allow HTTP/HTTPS
$ sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT
# Allow established connections
$ sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
# Drop everything else
$ sudo iptables -P INPUT DROP
# Allow loopback
$ sudo iptables -A INPUT -i lo -j ACCEPT
# NAT/Masquerade (for router/gateway)
$ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
$ sudo sysctl -w net.ipv4.ip_forward=1
Delete rule:
$ sudo iptables -D INPUT 3 # Delete rule #3
$ sudo iptables -F # Flush all rules
Save/restore:
$ sudo iptables-save > /etc/iptables/rules.v4
$ sudo iptables-restore < /etc/iptables/rules.v4
Modern wrapper around iptables/nftables:
$ sudo firewall-cmd --list-all
public (active)
target: default
services: ssh dhcpv6-client
ports: 80/tcp 443/tcp
Add service:
$ sudo firewall-cmd --add-service=http --permanent
$ sudo firewall-cmd --reload
Add port:
$ sudo firewall-cmd --add-port=8080/tcp --permanent
$ sudo firewall-cmd --reload
Basic commands:
$ sudo systemctl start nginx # Start service
$ sudo systemctl stop nginx # Stop service
$ sudo systemctl restart nginx # Restart service
$ sudo systemctl reload nginx # Reload config (no downtime)
$ sudo systemctl status nginx # Check status
Enable/disable (start on boot):
$ sudo systemctl enable nginx # Create symlink
$ sudo systemctl disable nginx # Remove symlink
$ sudo systemctl is-enabled nginx
List services:
$ systemctl list-units --type=service
$ systemctl list-units --type=service --state=running
$ systemctl list-unit-files --type=service
View logs:
$ sudo journalctl -u nginx # All logs for nginx
$ sudo journalctl -u nginx -f # Follow logs
$ sudo journalctl -u nginx --since today
$ sudo journalctl -u nginx --since "2025-01-15 10:00"
Service file: /etc/systemd/system/myapp.service
[Unit]
Description=My Application
After=network.target
[Service]
Type=simple
User=myuser
Group=mygroup
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/start.sh
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=10s
# Environment
Environment="PORT=8080"
EnvironmentFile=/etc/myapp/config
# Limits
LimitNOFILE=65536
MemoryLimit=2G
[Install]
WantedBy=multi-user.target
Service types:
- simple: Process doesn't fork (default)
- forking: Process forks (daemon)
- oneshot: Process exits (scripts)
- notify: Process notifies systemd when ready
Reload systemd after changes:
$ sudo systemctl daemon-reload
$ sudo systemctl start myapp
$ sudo systemctl enable myapp
Timer file: /etc/systemd/system/backup.timer
[Unit]
Description=Daily Backup Timer
[Timer]
OnCalendar=daily
OnCalendar=*-*-* 02:00:00 # Daily at 2 AM
Persistent=true
[Install]
WantedBy=timers.target
Service file: /etc/systemd/system/backup.service
[Unit]
Description=Backup Service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh
Enable timer:
$ sudo systemctl daemon-reload
$ sudo systemctl enable backup.timer
$ sudo systemctl start backup.timer
List timers:
$ systemctl list-timers
NEXT LEFT LAST PASSED UNIT
Wed 2025-01-15 02:00:00 EST 6h left Tue 2025-01-14 02:00:00 EST 18h ago backup.timer
$ top
top - 10:30:15 up 5 days, 3:25, 2 users, load average: 1.50, 1.25, 0.95
Tasks: 234 total, 1 running, 233 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.2 us, 2.1 sy, 0.0 ni, 92.5 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 16000.0 total, 2000.0 free, 8000.0 used, 6000.0 buff/cache
MiB Swap: 8192.0 total, 7168.0 free, 1024.0 used. 6500.0 avail Mem
Load average: (1 min, 5 min, 15 min)
- Average number of processes waiting for CPU
- < 1.0 per core = system not saturated
- > 1.0 per core = processes waiting
CPU states:
- us (user): Time in user space
- sy (system): Time in kernel space
- ni (nice): Time in nice'd processes
- id (idle): Idle time
- wa (iowait): Waiting for I/O (disk, network)
- hi (hardware interrupts): Servicing hardware interrupts
- si (software interrupts): Servicing software interrupts
- st (steal): Time stolen by hypervisor (VMs)
High wa% = I/O bottleneck (slow disk/network)
High sy% = Kernel bottleneck (lots of syscalls, context switches)
$ iostat -x 1 # Extended stats, 1 second interval
avg-cpu: %user %nice %system %iowait %steal %idle
5.20 0.00 2.10 0.20 0.00 92.50
Device r/s w/s rkB/s wkB/s util%
sda 100.0 200.0 4096.0 8192.0 45.2
Key metrics:
- r/s, w/s: Reads/writes per second
- rkB/s, wkB/s: KB read/written per second
- await: Average time (ms) for I/O requests
- %util: Percentage of time device was busy
%util > 80% = Disk saturated (bottleneck)
await > 10ms (SSD) or > 20ms (HDD) = Slow disk
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 102400 204800 51200 614400 0 0 100 200 1000 2000 5 2 92 1 0
Columns:
procs:
r: Processes waiting for CPU
b: Processes blocked (uninterruptible sleep - I/O wait)
memory:
swpd: Swap used
free: Free memory
buff/cache: Buffer/cache
swap:
si: Swap in (from disk)
so: Swap out (to disk)
io:
bi: Blocks in (read from disk)
bo: Blocks out (written to disk)
system:
in: Interrupts per second
cs: Context switches per second
High r value = CPU bottleneck
High b value = I/O bottleneck
si/so > 0 consistently = Memory pressure (swapping)
Historical performance data (requires sysstat package):
CPU usage:
$ sar -u 1 5 # CPU usage, 1 sec interval, 5 iterations
Memory:
$ sar -r 1 5 # Memory usage
Disk:
$ sar -d 1 5 # Disk I/O
Network:
$ sar -n DEV 1 5 # Network interface stats
Historical data (from /var/log/sa/):
$ sar -u -f /var/log/sa/sa15 # CPU usage from 15th
Network connections:
$ ss -tulpn # TCP/UDP listening ports
$ ss -tnp # TCP connections with process names
$ netstat -tulpn # Legacy equivalent
Bandwidth monitoring:
$ iftop # Real-time bandwidth by connection
$ nethogs # Bandwidth by process
$ nload # Total bandwidth per interface
Test network speed:
$ iperf3 -s # Server
$ iperf3 -c server_ip # Client
DNS lookup:
$ dig google.com
$ nslookup google.com
$ host google.com
Test connectivity:
$ ping -c 4 google.com
$ traceroute google.com
$ mtr google.com # Combined ping + traceroute
TCP connection test:
$ telnet google.com 80
$ nc -zv google.com 80 # Netcat
Check open files/sockets:
$ lsof -i :80 # What's using port 80?
$ lsof -i TCP:80
$ lsof -p 1234 # Files opened by PID 1234
Systematic Approach:
uptime or top - High load?top - %idle low? Which process?free -h - Swapping? vmstat 1 - si/so > 0?iostat -x 1 - %util high? iotop - Which process?iftop or nethogs - Saturated link?Quick diagnosis commands:
$ uptime # Load average
$ top -bn1 | head -20 # Snapshot of top processes
$ free -h # Memory status
$ df -h # Disk space
$ iostat -x 1 3 # I/O stats
$ vmstat 1 5 # VM stats
$ ps aux --sort=-%cpu | head # Top CPU users
$ ps aux --sort=-%mem | head # Top memory users
Find what's using space:
$ df -h # Which filesystem is full?
/dev/sda1 100G 95G 0 100% /
$ du -sh /* # Size of top-level directories
$ du -sh /var/* # Drill down
$ du -h /var/log | sort -hr | head -20 # Largest files in /var/log
Find large files:
$ find / -type f -size +1G -exec ls -lh {} \; 2>/dev/null
$ find /var/log -type f -size +100M -ls
Find deleted but open files (space not freed):
$ lsof | grep deleted
$ lsof +L1 # Files with link count 0 (deleted but open)
Clean up:
$ sudo journalctl --vacuum-size=100M # Limit journal size
$ sudo journalctl --vacuum-time=7d # Keep 7 days
$ sudo apt clean # Ubuntu/Debian package cache
$ sudo yum clean all # RHEL/CentOS package cache
Process stuck in uninterruptible sleep (D state):
$ ps aux | grep "D"
user 1234 0.0 0.0 0 0 ? D 10:00 0:00 [process]
Cause: Waiting for I/O (usually NFS, broken hardware)
Cannot be killed (even with -9)!
Solutions:
1. Fix underlying I/O issue (unmount NFS, fix disk)
2. Wait for I/O timeout
3. Reboot (last resort)
Zombie process (Z state):
$ ps aux | grep "Z"
user 1234 0.0 0.0 0 0 ? Z 10:00 0:00 [defunct]
Cause: Process exited but parent hasn't reaped it
Solution: Kill parent process or wait for parent to exit
Systematic checks:
1. Is server reachable?
$ ping server
2. Is SSH port open?
$ telnet server 22
$ nc -zv server 22
3. Is sshd running?
$ systemctl status sshd
4. Firewall blocking?
$ sudo iptables -L -n | grep 22
$ sudo firewall-cmd --list-all
5. Check SSH logs:
$ sudo journalctl -u sshd -f
$ sudo tail -f /var/log/auth.log # Debian
$ sudo tail -f /var/log/secure # RHEL
6. Too many authentication failures?
$ sudo grep "Failed password" /var/log/auth.log
7. sshd_config issues?
$ sudo sshd -t # Test config
$ sudo cat /etc/ssh/sshd_config | grep -v "^#"
Load average high, but CPU idle - Why?
$ uptime
load average: 5.00, 4.50, 4.00
$ top
%Cpu(s): 2.0 us, 1.0 sy, 97.0 id, 0.0 wa
Diagnosis: Processes waiting for I/O (not CPU)
Check:
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 5 0 20000 50000 600000 0 0 5000 10000 1000 2000 2 1 97 0 0
b = 5 → 5 processes blocked (I/O wait)
$ iostat -x 1
Device r/s w/s await %util
sda 1000 500 250 99.5
await = 250ms (very slow!)
%util = 99.5% (disk saturated)
Solutions:
- Identify I/O heavy process (iotop)
- Check disk health (smartctl)
- Consider faster storage (SSD)
- Optimize application I/O patterns
1. Boot from installation media or:
2. Interrupt boot (GRUB menu)
- Press 'e' to edit
- Find linux/vmlinuz line
- Add: systemd.unit=rescue.target
- Or add: init=/bin/bash (emergency shell, skip systemd)
- Boot with Ctrl+X
3. Root filesystem read-only in emergency mode:
$ mount -o remount,rw /
$ mount -a # Mount all from /etc/fstab
4. Fix issue (bad fstab, reset password, etc.)
5. Reboot:
$ systemctl reboot
Or emergency mode:
$ sync; reboot -f
1. Boot to emergency shell (init=/bin/bash)
2. Remount root read-write:
$ mount -o remount,rw /
3. Change password:
$ passwd root
4. SELinux systems (RHEL/CentOS):
$ touch /.autorelabel # Relabel on next boot
5. Reboot:
$ sync
$ reboot -f
System won't boot due to bad /etc/fstab entry:
1. Boot to rescue mode
2. Mount root:
$ mount -o remount,rw /
3. Edit fstab:
$ vi /etc/fstab
# Comment out or fix bad entry
4. Test:
$ mount -a # Should succeed without errors
5. Reboot