Linux Systems Administration

The Complete Boot Sequence

graph TD A[Power On] --> B[BIOS/UEFI] B --> C[Bootloader - GRUB] C --> D[Kernel Initialization] D --> E[Init System - systemd] E --> F[System Services] F --> G[Login Prompt]

Stage 1: BIOS/UEFI

Power On → BIOS/UEFI Firmware
1. POST (Power-On Self-Test)
   - Check CPU, RAM, peripherals
   - Initialize hardware
2. Find boot device (hard drive, USB, network)
3. Read MBR (Master Boot Record) - first 512 bytes of disk
   - Or ESP (EFI System Partition) for UEFI
4. Load bootloader into memory
5. Transfer control to bootloader

Stage 2: Bootloader (GRUB2)

GRUB (Grand Unified Bootloader)
1. Display boot menu (or auto-boot after timeout)
2. Load kernel image (/boot/vmlinuz-*)
3. Load initial RAM disk (initrd/initramfs)
4. Pass kernel parameters (from /etc/default/grub)
5. Jump to kernel entry point

Example GRUB entry:
linux /boot/vmlinuz-5.15.0 root=UUID=xxx ro quiet
initrd /boot/initramfs-5.15.0.img

Parameters:
- root=: Root filesystem device
- ro: Mount root read-only initially (fsck can run)
- quiet: Suppress verbose messages
- single: Boot to single-user mode (rescue)

Stage 3: Kernel Initialization

Kernel Boot Process:
1. Decompress kernel (if compressed)
2. Initialize kernel subsystems:
   - Memory management (MMU, paging)
   - Process scheduler
   - Device drivers
3. Mount initramfs (initial RAM filesystem)
   - Temporary root filesystem in RAM
   - Contains drivers needed to mount real root
   - Loads modules for disk controllers, RAID, LVM
4. Execute /init script in initramfs
5. Mount real root filesystem (from root= parameter)
6. Pivot to real root (switch_root)
7. Execute init system (/sbin/init → systemd)

View kernel boot messages:
$ dmesg | less
$ journalctl -k  (kernel messages via systemd)

Stage 4: Init System (systemd)

systemd becomes PID 1
1. Read configuration from /etc/systemd/
2. Determine target (runlevel equivalent)
   - multi-user.target (CLI, no GUI)
   - graphical.target (GUI)
3. Start services in dependency order
   - Parallel initialization where possible
4. Mount filesystems from /etc/fstab
5. Start getty (login prompts)

Check boot time:
$ systemd-analyze
Startup finished in 2.5s (kernel) + 8.3s (userspace) = 10.8s

Check service startup times:
$ systemd-analyze blame

Boot Targets (Runlevels)

Target	Old Runlevel	Description
poweroff.target	0	Shutdown system
rescue.target	1, s, single	Single-user mode (root only, minimal services)
multi-user.target	3	Multi-user text mode (no GUI)
graphical.target	5	Multi-user with GUI
reboot.target	6	Reboot system

Check current target:
$ systemctl get-default
graphical.target

Change default target:
$ sudo systemctl set-default multi-user.target

Boot to rescue mode (from GRUB):
- Edit GRUB entry (press 'e')
- Add "systemd.unit=rescue.target" to kernel line
- Boot with Ctrl+X

Process Fundamentals

Process Lifecycle

Process States:
┌─────────┐   fork()    ┌─────────┐   exec()    ┌─────────┐
│ Parent  │ ──────────> │ Child   │ ──────────> │ Running │
│ Process │             │ (copy)  │             │ Program │
└─────────┘             └─────────┘             └─────────┘
                                                      │
                                                      ├──> Running (on CPU)
                                                      ├──> Runnable (waiting for CPU)
                                                      ├──> Sleeping (waiting for I/O)
                                                      ├──> Stopped (Ctrl+Z)
                                                      └──> Zombie (exited, parent hasn't reaped)

Process Creation:
1. fork() - Create copy of current process
   - Child gets copy of parent's memory, file descriptors
   - Child has new PID
   - Both processes continue from fork() return point
2. exec() - Replace process image with new program
   - Load new executable
   - Replace memory space
   - Keep PID, file descriptors (unless close-on-exec)

Process Information

View processes:
$ ps aux    # All processes, user-oriented
$ ps -ef    # All processes, full format
$ pstree    # Process tree

Key fields:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1 169104 13140 ?        Ss   Jan15   0:12 /sbin/init
user      1234  2.5 10.2 2456780 1024000 ?     Sl   10:30   5:23 /usr/bin/app

VSZ: Virtual memory size (total addressable memory)
RSS: Resident Set Size (physical RAM actually used)
STAT: Process state
  S: Sleeping (interruptible)
  R: Running
  D: Sleeping (uninterruptible - usually I/O wait)
  T: Stopped
  Z: Zombie
  <: High priority
  N: Low priority
  s: Session leader

View specific process:
$ ps -p 1234 -f
$ cat /proc/1234/status  # Detailed process info
$ cat /proc/1234/cmdline # Command line
$ ls -l /proc/1234/fd    # Open file descriptors

Process Priority & Nice Values

Process Priority:
- Linux uses priority range: 0-139
- Real-time: 0-99 (higher = higher priority)
- Normal: 100-139 (lower = higher priority)
- Nice values: -20 to +19 (user-adjustable)
  - Nice -20 = highest priority (100)
  - Nice 0 = default (120)
  - Nice 19 = lowest priority (139)

Adjust priority:
$ nice -n 10 ./my-program     # Start with lower priority
$ renice -n 5 -p 1234         # Change running process
$ renice -n -5 -u username    # Requires root for negative nice

View priorities:
$ ps -eo pid,ni,pri,comm
  PID  NI PRI COMMAND
 1234   0  19 my-program
 5678 -10   9 important-task

System Calls (Syscalls)

What Are System Calls?

Definition: A system call is the interface between user-space programs and the kernel. It's how applications request services from the operating system.

Why System Calls Exist

Security & Isolation: User programs can't directly access hardware or kernel memory. System calls provide a controlled, safe way to perform privileged operations.

User Space vs Kernel Space:

┌─────────────────────────────────────┐
│ User Space (Ring 3)                 │ ← Applications run here
│ - Limited privileges                │
│ - Cannot access hardware directly   │
│ - Cannot access kernel memory       │
├─────────────────────────────────────┤
│ System Call Interface               │ ← Syscall boundary
├─────────────────────────────────────┤
│ Kernel Space (Ring 0)               │ ← Kernel runs here
│ - Full privileges                   │
│ - Direct hardware access            │
│ - Manage all resources              │
└─────────────────────────────────────┘

How System Calls Work

Example: Reading a file with read()

1. Application calls read(fd, buffer, size)
   ↓
2. C library (libc) prepares syscall
   - Put syscall number in register (RAX on x86-64)
   - Put arguments in registers (RDI, RSI, RDX, ...)
   ↓
3. Execute syscall instruction (formerly int 0x80)
   - CPU switches from Ring 3 (user) to Ring 0 (kernel)
   - Save user context (registers, stack)
   - Jump to kernel syscall handler
   ↓
4. Kernel executes syscall
   - Validate arguments (is fd valid? is buffer in user space?)
   - Perform operation (read from filesystem)
   - Prepare return value
   ↓
5. Return to user space
   - Restore user context
   - Switch from Ring 0 to Ring 3
   - Return value in RAX register
   ↓
6. C library returns to application

Cost: Context switch + validation ~100-300 ns (expensive!)

Common System Calls

Category	System Calls	Purpose
Process Control	fork, exec, exit, wait, kill	Create, terminate, manage processes
File Operations	open, close, read, write, lseek, stat	File I/O and metadata
Directory Operations	mkdir, rmdir, chdir, opendir, readdir	Directory management
Device Management	ioctl, read, write	Device I/O control
Memory Management	brk, mmap, munmap, mprotect	Allocate, map, protect memory
Communication	pipe, socket, send, recv, shmget	IPC (inter-process communication)
Protection	chmod, chown, setuid, setgid	Permissions and ownership
Time	time, gettimeofday, clock_gettime	Get system time

Tracing System Calls

strace - Trace system calls and signals

$ strace ls
execve("/bin/ls", ["ls"], 0x7ffd...) = 0
brk(NULL)                               = 0x55555576f000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, ...) = 0x7f1234567000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY...) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=123456, ...}) = 0
mmap(NULL, 123456, PROT_READ, ...)      = 0x7f1234560000
close(3)                                = 0
...
write(1, "file1.txt\nfile2.txt\n", 20)  = 20
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?

Useful strace options:
$ strace -c ls              # Count syscalls
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 25.00    0.000100           5        20           write
 20.00    0.000080           4        20           read
 15.00    0.000060           3        20           open
...

$ strace -e open,read ls    # Trace only specific syscalls
$ strace -p 1234            # Attach to running process
$ strace -f ./program       # Follow forks
$ strace -T ls              # Show time spent in each syscall
$ strace -o output.txt ls   # Save to file

Performance analysis:
$ strace -c -p 1234         # Count syscalls for running process

Why Syscalls Matter for Performance

Problem: System calls are expensive (context switch overhead)

Bad: Many small syscalls
for i in range(1000000):
    write(fd, "x", 1)        # 1 million syscalls!
    # ~100ns each = 100ms total in syscall overhead

Good: Batch operations
buffer = "x" * 1000000
write(fd, buffer, 1000000)   # 1 syscall!
# ~100ns syscall overhead

Optimization strategies:
- Buffered I/O (fwrite, fread vs write, read)
- Memory mapping (mmap) instead of read/write
- Batch syscalls where possible
- Use async I/O (io_uring on modern Linux)

Example - Reading file:
Syscall per byte:  1MB / 1 byte  = 1,000,000 syscalls → SLOW
Syscall per 4KB:   1MB / 4KB     = 256 syscalls       → Better
mmap entire file:  1 syscall                          → Fast

Modern Syscall Interface: vDSO

vDSO (virtual Dynamic Shared Object):
- Kernel maps read-only page into every process
- Contains fast-path syscalls (no context switch!)
- Used for: gettimeofday(), clock_gettime(), getcpu()

Without vDSO:
$ time ./get-time-1million   # 1M calls to gettimeofday()
real    0m5.432s             # 5.4 µs per call (context switch)

With vDSO:
$ time ./get-time-1million
real    0m0.045s             # 45 ns per call (no context switch!)

120x faster! No kernel transition needed.

Signals & Process Control

Common Signals

Signal	Number	Default Action	Description
SIGHUP	1	Terminate	Hangup - terminal disconnected (often: reload config)
SIGINT	2	Terminate	Interrupt (Ctrl+C)
SIGQUIT	3	Core dump	Quit (Ctrl+\)
SIGKILL	9	Terminate	Kill immediately (cannot be caught/ignored)
SIGTERM	15	Terminate	Graceful termination (default kill signal)
SIGSTOP	19	Stop	Pause process (cannot be caught)
SIGCONT	18	Continue	Resume stopped process
SIGUSR1/2	10/12	Terminate	User-defined signals

Send signals:
$ kill 1234              # Send SIGTERM (15) - graceful
$ kill -9 1234           # Send SIGKILL - immediate
$ kill -HUP 1234         # Send SIGHUP - reload config
$ killall nginx          # Kill all nginx processes
$ pkill -f "python.*app" # Kill by pattern

Background jobs:
$ ./long-task &          # Run in background
[1] 5678
$ jobs                   # List background jobs
[1]+  Running     ./long-task &
$ fg %1                  # Bring to foreground
$ bg %1                  # Continue in background
$ Ctrl+Z                 # Suspend current job (SIGSTOP)
$ disown %1              # Detach from shell (won't die on logout)

Keep running after logout:
$ nohup ./long-task &    # Ignore SIGHUP, redirect output to nohup.out
$ screen ./long-task     # Use screen session
$ tmux                   # Use tmux session

Virtual Memory & Paging

How Virtual Memory Works

Virtual Memory Abstraction:
┌─────────────────────────────┐
│ Process Virtual Address     │ (e.g., 4GB on 32-bit, 128TB on 64-bit)
│  Space (per process)        │
├─────────────────────────────┤
│ 0xFFFFFFFF  Kernel Space    │ ← Shared by all processes
│             (1GB typical)   │
├─────────────────────────────┤
│             Stack           │ ← Grows down
│             ↓               │
│                             │
│             Heap            │ ← Grows up (malloc/new)
│             ↑               │
│             BSS (uninit)    │
│             Data (init)     │
│ 0x00000000  Text (code)     │
└─────────────────────────────┘
              │
              │ Page Table (MMU translation)
              ↓
┌─────────────────────────────┐
│ Physical RAM                │ (e.g., 16GB actual RAM)
│                             │
│ ┌──────┬──────┬──────┐     │
│ │ Page │ Page │ Page │ ... │ (4KB pages)
│ └──────┴──────┴──────┘     │
└─────────────────────────────┘

Why Virtual Memory?
1. Isolation: Processes can't access each other's memory
2. Simplicity: Each process sees full address space
3. Flexibility: Process can use more memory than physical RAM (swap)
4. Sharing: Multiple processes can share code pages (libc.so)

Memory Pages & Page Tables

Page Size: Typically 4KB (configurable: 2MB "huge pages")

Page Table Entry (PTE):
┌──────────────────────────────────────┐
│ Physical Frame # │ Flags             │
│ (20 bits)        │ Present/Valid     │
│                  │ Read/Write        │
│                  │ User/Supervisor   │
│                  │ Accessed          │
│                  │ Dirty (modified)  │
└──────────────────────────────────────┘

Page Fault:
1. Process accesses virtual address
2. MMU checks page table
3. Page not present → Page Fault (trap to kernel)
4. Types:
   - Minor: Page in memory but not mapped (just update table)
   - Major: Page on disk (swap) - must read from disk
   - Segfault: Invalid access (kill process)

View page faults:
$ ps -o min_flt,maj_flt,cmd 1234
MINFL  MAJFL CMD
12345    234 /usr/bin/myapp

Memory Overcommit

Linux allows overcommit by default:
- Processes can allocate more memory than physically available
- Kernel assumes not all allocated memory will be used
- On actual use (page fault), allocate physical page

Overcommit modes (/proc/sys/vm/overcommit_memory):
0: Heuristic (default) - reasonable overcommit allowed
1: Always - no limit (dangerous!)
2: Never - strict accounting (total commits < swap + RAM * ratio)

$ cat /proc/sys/vm/overcommit_memory
0
$ cat /proc/sys/vm/overcommit_ratio  # Percentage of RAM
50

Check committed memory:
$ cat /proc/meminfo | grep Commit
CommitLimit:    16000000 kB
Committed_AS:   12000000 kB

The OOM Killer

When OOM Strikes

Scenario: System runs out of memory. Kernel can't allocate pages. What happens?

OOM Killer: Kernel selects a process to kill to free memory. Prevents total system freeze.

OOM Killer Process:
1. System exhausts physical RAM + swap
2. Kernel can't satisfy memory allocation
3. OOM killer activated
4. Calculate OOM score for each process:
   - Based on: memory usage, runtime, process priority
   - Root/system processes have lower scores
5. Kill process with highest score
6. Log to /var/log/messages or dmesg

OOM Score:
$ cat /proc/1234/oom_score     # Current score (0-1000)
$ cat /proc/1234/oom_score_adj # Adjustment (-1000 to 1000)

Adjust OOM behavior:
$ echo -1000 > /proc/1234/oom_score_adj  # Never kill (reserved for critical)
$ echo 1000 > /proc/1234/oom_score_adj   # Kill first

Disable OOM killer (dangerous!):
$ echo 2 > /proc/sys/vm/panic_on_oom  # Kernel panic instead

View OOM events:
$ dmesg | grep -i oom
$ journalctl -k | grep -i "killed process"

Monitoring Memory

$ free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        8.0G        2.0G        100M        5.0G        6.5G
Swap:          8.0G        1.0G        7.0G

Fields explained:
- total: Total installed RAM
- used: RAM used by processes
- free: Completely unused RAM
- shared: Memory used by tmpfs (e.g., /dev/shm)
- buff/cache: Disk cache (will be freed if needed)
- available: Memory available for new processes (free + reclaimable cache)

Important: Linux uses "free" memory for caching!
"buff/cache" will be freed automatically if processes need it.
Look at "available" not "free" for actual free memory.

Detailed memory info:
$ cat /proc/meminfo
MemTotal:       16384000 kB
MemFree:         2048000 kB
Buffers:          512000 kB
Cached:          4096000 kB
SwapTotal:       8192000 kB
SwapFree:        7168000 kB
Dirty:             12000 kB  ← Dirty pages waiting to flush to disk
Writeback:             0 kB  ← Pages actively being written
AnonPages:       6144000 kB  ← Private process memory
Shmem:            102400 kB  ← Shared memory

Per-process memory:
$ pmap 1234          # Memory map of process
$ cat /proc/1234/smaps  # Detailed memory map
$ cat /proc/1234/status | grep -i mem
VmSize:      2456780 kB  ← Virtual memory
VmRSS:       1024000 kB  ← Resident (physical RAM)
VmData:       512000 kB  ← Data segment
VmStk:          136 kB  ← Stack
VmExe:         4096 kB  ← Code

Swap Space

What is Swap?
- Disk space used as "overflow" for RAM
- Inactive pages moved to swap when RAM pressure high
- Much slower than RAM (100x-1000x slower)

View swap:
$ swapon --show
NAME      TYPE SIZE USED PRIO
/swapfile file   8G 1.2G   -2

Add swap:
$ sudo dd if=/dev/zero of=/swapfile bs=1M count=8192  # 8GB
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile

Swappiness (0-100):
- Controls how aggressively kernel swaps
- 0: Swap only to avoid OOM
- 60: Default (balanced)
- 100: Aggressive swapping

$ cat /proc/sys/vm/swappiness
60
$ sudo sysctl vm.swappiness=10  # Less aggressive

Disk Partitioning

Partition Tables: MBR vs GPT

Feature	MBR (Master Boot Record)	GPT (GUID Partition Table)
Max Disk Size	2 TB	9.4 ZB (zettabytes)
Max Partitions	4 primary (or 3 + extended with logical)	128 (default, can be more)
Boot Mode	BIOS	UEFI (also BIOS with protective MBR)
Redundancy	None (single point of failure)	Backup partition table at end of disk

List disks and partitions:
$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0   500G  0 disk
├─sda1   8:1    0   512M  0 part /boot/efi
├─sda2   8:2    0   100G  0 part /
└─sda3   8:3    0   399G  0 part /home

$ fdisk -l /dev/sda
Disk /dev/sda: 500 GB
Disk identifier: 0x12345678

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sda1  *       2048   1050623   1048576   512M ef EFI System
/dev/sda2       1050624 210765823 209715200   100G 83 Linux
/dev/sda3     210765824 976773167 765997344   365G 83 Linux

Partition tools:
- fdisk: MBR partitions (interactive)
- gdisk: GPT partitions (interactive)
- parted: Both MBR/GPT (scriptable)

Create partition with parted:
$ sudo parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart primary ext4 0% 50%
(parted) mkpart primary ext4 50% 100%
(parted) print
(parted) quit

Format partition:
$ sudo mkfs.ext4 /dev/sdb1
$ sudo mkfs.xfs /dev/sdb2

LVM: Logical Volume Manager

LVM Architecture

LVM Layers:
┌─────────────────────────────────────────┐
│ Filesystems (ext4, xfs, etc.)           │
├─────────────────────────────────────────┤
│ Logical Volumes (LVs)                   │
│  /dev/vg01/lv_root   /dev/vg01/lv_home  │ ← Flexible, resizable
├─────────────────────────────────────────┤
│ Volume Group (VG)                       │
│  vg01 (pool of storage)                 │ ← Aggregates PVs
├─────────────────────────────────────────┤
│ Physical Volumes (PVs)                  │
│  /dev/sda2    /dev/sdb1    /dev/sdc1    │ ← Partitions or disks
└─────────────────────────────────────────┘

Why LVM?
- Resize volumes without downtime
- Snapshots (for backups)
- Combine multiple disks into one volume
- Move data between disks while online

LVM Commands

Create LVM setup:
1. Create Physical Volumes:
   $ sudo pvcreate /dev/sdb1 /dev/sdc1
   $ sudo pvdisplay

2. Create Volume Group:
   $ sudo vgcreate vg01 /dev/sdb1 /dev/sdc1
   $ sudo vgdisplay vg01

3. Create Logical Volumes:
   $ sudo lvcreate -L 50G -n lv_root vg01     # Fixed size
   $ sudo lvcreate -l 100%FREE -n lv_home vg01  # Use remaining space
   $ sudo lvdisplay

4. Format and mount:
   $ sudo mkfs.ext4 /dev/vg01/lv_root
   $ sudo mount /dev/vg01/lv_root /mnt

Resize LV (extend):
$ sudo lvextend -L +20G /dev/vg01/lv_root  # Add 20GB
$ sudo resize2fs /dev/vg01/lv_root         # Extend filesystem (ext4)
$ sudo xfs_growfs /mnt                     # Extend filesystem (xfs)

Resize LV (shrink - ext4 only, requires unmount):
$ sudo umount /dev/vg01/lv_root
$ sudo e2fsck -f /dev/vg01/lv_root
$ sudo resize2fs /dev/vg01/lv_root 30G
$ sudo lvreduce -L 30G /dev/vg01/lv_root

LVM Snapshots:
$ sudo lvcreate -L 10G -s -n lv_root_snap /dev/vg01/lv_root
$ sudo mount /dev/vg01/lv_root_snap /mnt/snapshot
# Make changes to original...
# Restore from snapshot if needed:
$ sudo lvconvert --merge /dev/vg01/lv_root_snap

RAID: Redundant Array of Independent Disks

Level	Min Disks	Usable Space	Fault Tolerance	Use Case
RAID 0	2	100% (all disks)	None - any disk fails = data loss	Performance (striping), not production
RAID 1	2	50% (mirror)	1 disk can fail	Redundancy, small datasets
RAID 5	3	(N-1) disks	1 disk can fail	Balanced (performance + redundancy)
RAID 6	4	(N-2) disks	2 disks can fail	High redundancy, large arrays
RAID 10	4	50%	1 disk per mirror pair	High performance + redundancy

Linux Software RAID (mdadm):

Create RAID 1:
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdc1
$ cat /proc/mdstat
md0 : active raid1 sdc1[1] sdb1[0]
      524224 blocks super 1.2 [2/2] [UU]

Create RAID 5:
$ sudo mdadm --create /dev/md1 --level=5 --raid-devices=3 /dev/sdb1 /dev/sdc1 /dev/sdd1

Check status:
$ sudo mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Wed Jan 15 10:00:00 2025
     Raid Level : raid1
     Array Size : 524224 (511.88 MiB 536.74 MB)
    Device Size : 524224 (511.88 MiB 536.74 MB)
   Raid Devices : 2
  Total Devices : 2
    State : clean

Replace failed disk:
$ sudo mdadm --manage /dev/md0 --fail /dev/sdb1
$ sudo mdadm --manage /dev/md0 --remove /dev/sdb1
$ sudo mdadm --manage /dev/md0 --add /dev/sde1  # New disk
# Watch rebuild:
$ watch cat /proc/mdstat

Network Interfaces

List interfaces:
$ ip link show
1: lo:  mtu 65536
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0:  mtu 1500
    link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
3: wlan0:  mtu 1500
    link/ether 00:1a:2b:3c:4d:5e brd ff:ff:ff:ff:ff:ff

Show IP addresses:
$ ip addr show
$ ip addr show eth0
2: eth0:  mtu 1500
    inet 192.168.1.100/24 brd 192.168.1.255 scope global eth0
    inet6 fe80::5054:ff:fe12:3456/64 scope link

Configure interface:
$ sudo ip addr add 192.168.1.100/24 dev eth0
$ sudo ip link set eth0 up
$ sudo ip link set eth0 down

Persistent configuration (Ubuntu/Debian - netplan):
$ sudo vi /etc/netplan/01-netcfg.yaml
network:
  version: 2
  ethernets:
    eth0:
      addresses:
        - 192.168.1.100/24
      gateway4: 192.168.1.1
      nameservers:
        addresses: [8.8.8.8, 1.1.1.1]

$ sudo netplan apply

Persistent (RHEL/CentOS - NetworkManager):
$ sudo nmcli con add type ethernet ifname eth0 con-name eth0
$ sudo nmcli con mod eth0 ipv4.addresses 192.168.1.100/24
$ sudo nmcli con mod eth0 ipv4.gateway 192.168.1.1
$ sudo nmcli con mod eth0 ipv4.dns "8.8.8.8 1.1.1.1"
$ sudo nmcli con mod eth0 ipv4.method manual
$ sudo nmcli con up eth0

Routing

View routing table:
$ ip route show
default via 192.168.1.1 dev eth0 proto static
192.168.1.0/24 dev eth0 proto kernel scope link src 192.168.1.100

$ route -n  (legacy)
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

Add route:
$ sudo ip route add 10.0.0.0/8 via 192.168.1.254
$ sudo ip route add default via 192.168.1.1

Delete route:
$ sudo ip route del 10.0.0.0/8

Trace route:
$ traceroute google.com
$ mtr google.com  # Continuous traceroute

Firewall: iptables & nftables

iptables Basics

Tables and Chains:
filter table (default):
  - INPUT: Packets destined for local system
  - OUTPUT: Packets originating from local system
  - FORWARD: Packets routed through system

nat table:
  - PREROUTING: Alter packets before routing
  - POSTROUTING: Alter packets after routing (SNAT/MASQUERADE)
  - OUTPUT: NAT for locally generated packets

List rules:
$ sudo iptables -L -n -v
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
  123  9876 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22

Common rules:
# Allow SSH
$ sudo iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Allow HTTP/HTTPS
$ sudo iptables -A INPUT -p tcp --dport 80 -j ACCEPT
$ sudo iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Allow established connections
$ sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Drop everything else
$ sudo iptables -P INPUT DROP

# Allow loopback
$ sudo iptables -A INPUT -i lo -j ACCEPT

# NAT/Masquerade (for router/gateway)
$ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
$ sudo sysctl -w net.ipv4.ip_forward=1

Delete rule:
$ sudo iptables -D INPUT 3  # Delete rule #3
$ sudo iptables -F          # Flush all rules

Save/restore:
$ sudo iptables-save > /etc/iptables/rules.v4
$ sudo iptables-restore < /etc/iptables/rules.v4

firewalld (RHEL/CentOS)

Modern wrapper around iptables/nftables:

$ sudo firewall-cmd --list-all
public (active)
  target: default
  services: ssh dhcpv6-client
  ports: 80/tcp 443/tcp

Add service:
$ sudo firewall-cmd --add-service=http --permanent
$ sudo firewall-cmd --reload

Add port:
$ sudo firewall-cmd --add-port=8080/tcp --permanent
$ sudo firewall-cmd --reload

Systemd Units

Service Management

Basic commands:
$ sudo systemctl start nginx      # Start service
$ sudo systemctl stop nginx       # Stop service
$ sudo systemctl restart nginx    # Restart service
$ sudo systemctl reload nginx     # Reload config (no downtime)
$ sudo systemctl status nginx     # Check status

Enable/disable (start on boot):
$ sudo systemctl enable nginx     # Create symlink
$ sudo systemctl disable nginx    # Remove symlink
$ sudo systemctl is-enabled nginx

List services:
$ systemctl list-units --type=service
$ systemctl list-units --type=service --state=running
$ systemctl list-unit-files --type=service

View logs:
$ sudo journalctl -u nginx        # All logs for nginx
$ sudo journalctl -u nginx -f     # Follow logs
$ sudo journalctl -u nginx --since today
$ sudo journalctl -u nginx --since "2025-01-15 10:00"

Creating a Systemd Service

Service file: /etc/systemd/system/myapp.service

[Unit]
Description=My Application
After=network.target

[Service]
Type=simple
User=myuser
Group=mygroup
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/start.sh
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=10s

# Environment
Environment="PORT=8080"
EnvironmentFile=/etc/myapp/config

# Limits
LimitNOFILE=65536
MemoryLimit=2G

[Install]
WantedBy=multi-user.target

Service types:
- simple: Process doesn't fork (default)
- forking: Process forks (daemon)
- oneshot: Process exits (scripts)
- notify: Process notifies systemd when ready

Reload systemd after changes:
$ sudo systemctl daemon-reload
$ sudo systemctl start myapp
$ sudo systemctl enable myapp

Systemd Timers (Cron Replacement)

Timer file: /etc/systemd/system/backup.timer

[Unit]
Description=Daily Backup Timer

[Timer]
OnCalendar=daily
OnCalendar=*-*-* 02:00:00  # Daily at 2 AM
Persistent=true

[Install]
WantedBy=timers.target

Service file: /etc/systemd/system/backup.service

[Unit]
Description=Backup Service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup.sh

Enable timer:
$ sudo systemctl daemon-reload
$ sudo systemctl enable backup.timer
$ sudo systemctl start backup.timer

List timers:
$ systemctl list-timers
NEXT                         LEFT          LAST                         PASSED  UNIT
Wed 2025-01-15 02:00:00 EST  6h left       Tue 2025-01-14 02:00:00 EST  18h ago backup.timer

System Performance Tools

top / htop

$ top
top - 10:30:15 up 5 days,  3:25,  2 users,  load average: 1.50, 1.25, 0.95
Tasks: 234 total,   1 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s):  5.2 us,  2.1 sy,  0.0 ni, 92.5 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  16000.0 total,   2000.0 free,   8000.0 used,   6000.0 buff/cache
MiB Swap:   8192.0 total,   7168.0 free,   1024.0 used.   6500.0 avail Mem

Load average: (1 min, 5 min, 15 min)
- Average number of processes waiting for CPU
- < 1.0 per core = system not saturated
- > 1.0 per core = processes waiting

CPU states:
- us (user): Time in user space
- sy (system): Time in kernel space
- ni (nice): Time in nice'd processes
- id (idle): Idle time
- wa (iowait): Waiting for I/O (disk, network)
- hi (hardware interrupts): Servicing hardware interrupts
- si (software interrupts): Servicing software interrupts
- st (steal): Time stolen by hypervisor (VMs)

High wa% = I/O bottleneck (slow disk/network)
High sy% = Kernel bottleneck (lots of syscalls, context switches)

iostat - Disk I/O

$ iostat -x 1  # Extended stats, 1 second interval
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.20    0.00    2.10    0.20    0.00   92.50

Device   r/s     w/s     rkB/s   wkB/s  util%
sda    100.0   200.0   4096.0  8192.0   45.2

Key metrics:
- r/s, w/s: Reads/writes per second
- rkB/s, wkB/s: KB read/written per second
- await: Average time (ms) for I/O requests
- %util: Percentage of time device was busy

%util > 80% = Disk saturated (bottleneck)
await > 10ms (SSD) or > 20ms (HDD) = Slow disk

vmstat - Virtual Memory Stats

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0 102400 204800 51200 614400    0    0   100   200 1000 2000  5  2 92  1  0

Columns:
procs:
  r: Processes waiting for CPU
  b: Processes blocked (uninterruptible sleep - I/O wait)

memory:
  swpd: Swap used
  free: Free memory
  buff/cache: Buffer/cache

swap:
  si: Swap in (from disk)
  so: Swap out (to disk)

io:
  bi: Blocks in (read from disk)
  bo: Blocks out (written to disk)

system:
  in: Interrupts per second
  cs: Context switches per second

High r value = CPU bottleneck
High b value = I/O bottleneck
si/so > 0 consistently = Memory pressure (swapping)

sar - System Activity Reporter

Historical performance data (requires sysstat package):

CPU usage:
$ sar -u 1 5  # CPU usage, 1 sec interval, 5 iterations

Memory:
$ sar -r 1 5  # Memory usage

Disk:
$ sar -d 1 5  # Disk I/O

Network:
$ sar -n DEV 1 5  # Network interface stats

Historical data (from /var/log/sa/):
$ sar -u -f /var/log/sa/sa15  # CPU usage from 15th

Network Performance

Network connections:
$ ss -tulpn  # TCP/UDP listening ports
$ ss -tnp    # TCP connections with process names

$ netstat -tulpn  # Legacy equivalent

Bandwidth monitoring:
$ iftop      # Real-time bandwidth by connection
$ nethogs    # Bandwidth by process
$ nload      # Total bandwidth per interface

Test network speed:
$ iperf3 -s  # Server
$ iperf3 -c server_ip  # Client

DNS lookup:
$ dig google.com
$ nslookup google.com
$ host google.com

Test connectivity:
$ ping -c 4 google.com
$ traceroute google.com
$ mtr google.com  # Combined ping + traceroute

TCP connection test:
$ telnet google.com 80
$ nc -zv google.com 80  # Netcat

Check open files/sockets:
$ lsof -i :80         # What's using port 80?
$ lsof -i TCP:80
$ lsof -p 1234        # Files opened by PID 1234

Common Production Issues

Scenario 1: System is Slow

Systematic Approach:

Check load average: uptime or top - High load?
CPU bottleneck? top - %idle low? Which process?
Memory bottleneck? free -h - Swapping? vmstat 1 - si/so > 0?
Disk I/O bottleneck? iostat -x 1 - %util high? iotop - Which process?
Network bottleneck? iftop or nethogs - Saturated link?

Quick diagnosis commands:
$ uptime                    # Load average
$ top -bn1 | head -20       # Snapshot of top processes
$ free -h                   # Memory status
$ df -h                     # Disk space
$ iostat -x 1 3             # I/O stats
$ vmstat 1 5                # VM stats
$ ps aux --sort=-%cpu | head  # Top CPU users
$ ps aux --sort=-%mem | head  # Top memory users

Scenario 2: Disk Full

Find what's using space:
$ df -h              # Which filesystem is full?
/dev/sda1            100G   95G     0  100% /

$ du -sh /*          # Size of top-level directories
$ du -sh /var/*      # Drill down
$ du -h /var/log | sort -hr | head -20  # Largest files in /var/log

Find large files:
$ find / -type f -size +1G -exec ls -lh {} \; 2>/dev/null
$ find /var/log -type f -size +100M -ls

Find deleted but open files (space not freed):
$ lsof | grep deleted
$ lsof +L1  # Files with link count 0 (deleted but open)

Clean up:
$ sudo journalctl --vacuum-size=100M  # Limit journal size
$ sudo journalctl --vacuum-time=7d    # Keep 7 days
$ sudo apt clean    # Ubuntu/Debian package cache
$ sudo yum clean all  # RHEL/CentOS package cache

Scenario 3: Process Won't Die

Process stuck in uninterruptible sleep (D state):
$ ps aux | grep "D"
user     1234  0.0  0.0      0     0 ?        D    10:00   0:00 [process]

Cause: Waiting for I/O (usually NFS, broken hardware)
Cannot be killed (even with -9)!

Solutions:
1. Fix underlying I/O issue (unmount NFS, fix disk)
2. Wait for I/O timeout
3. Reboot (last resort)

Zombie process (Z state):
$ ps aux | grep "Z"
user     1234  0.0  0.0      0     0 ?        Z    10:00   0:00 [defunct]

Cause: Process exited but parent hasn't reaped it
Solution: Kill parent process or wait for parent to exit

Scenario 4: Cannot SSH to Server

Systematic checks:
1. Is server reachable?
   $ ping server

2. Is SSH port open?
   $ telnet server 22
   $ nc -zv server 22

3. Is sshd running?
   $ systemctl status sshd

4. Firewall blocking?
   $ sudo iptables -L -n | grep 22
   $ sudo firewall-cmd --list-all

5. Check SSH logs:
   $ sudo journalctl -u sshd -f
   $ sudo tail -f /var/log/auth.log  # Debian
   $ sudo tail -f /var/log/secure    # RHEL

6. Too many authentication failures?
   $ sudo grep "Failed password" /var/log/auth.log

7. sshd_config issues?
   $ sudo sshd -t  # Test config
   $ sudo cat /etc/ssh/sshd_config | grep -v "^#"

Scenario 5: High Load but Low CPU Usage

Load average high, but CPU idle - Why?
$ uptime
load average: 5.00, 4.50, 4.00

$ top
%Cpu(s):  2.0 us,  1.0 sy, 97.0 id, 0.0 wa

Diagnosis: Processes waiting for I/O (not CPU)

Check:
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  5      0  20000  50000 600000    0    0  5000 10000 1000 2000  2  1 97  0  0

b = 5 → 5 processes blocked (I/O wait)

$ iostat -x 1
Device   r/s   w/s   await  %util
sda     1000   500    250    99.5

await = 250ms (very slow!)
%util = 99.5% (disk saturated)

Solutions:
- Identify I/O heavy process (iotop)
- Check disk health (smartctl)
- Consider faster storage (SSD)
- Optimize application I/O patterns

Emergency Recovery

Boot to Rescue Mode

1. Boot from installation media or:

2. Interrupt boot (GRUB menu)
   - Press 'e' to edit
   - Find linux/vmlinuz line
   - Add: systemd.unit=rescue.target
   - Or add: init=/bin/bash (emergency shell, skip systemd)
   - Boot with Ctrl+X

3. Root filesystem read-only in emergency mode:
   $ mount -o remount,rw /
   $ mount -a  # Mount all from /etc/fstab

4. Fix issue (bad fstab, reset password, etc.)

5. Reboot:
   $ systemctl reboot
   Or emergency mode:
   $ sync; reboot -f

Reset Root Password

1. Boot to emergency shell (init=/bin/bash)
2. Remount root read-write:
   $ mount -o remount,rw /
3. Change password:
   $ passwd root
4. SELinux systems (RHEL/CentOS):
   $ touch /.autorelabel  # Relabel on next boot
5. Reboot:
   $ sync
   $ reboot -f

Fix Broken fstab

System won't boot due to bad /etc/fstab entry:

1. Boot to rescue mode
2. Mount root:
   $ mount -o remount,rw /
3. Edit fstab:
   $ vi /etc/fstab
   # Comment out or fix bad entry
4. Test:
   $ mount -a  # Should succeed without errors
5. Reboot

Why This Matters for Distinguished Engineers

Linux Boot Process