Threading Models: Python GIL vs Go Routines
Rob Pike (Go creator): "Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once."
Definition: Multiple tasks making progress by interleaving execution. Tasks may NOT run simultaneously.
Single CPU Core - Concurrency (Time Slicing):
Time →
Task A: ████░░░░░░░░████░░░░░░░░████
Task B: ░░░░████░░░░░░░░████░░░░░░░░
Task C: ░░░░░░░░████░░░░░░░░████░░░░
Tasks take turns, switching rapidly (context switching).
From user's perspective, all tasks appear to run "at the same time."
Reality: Only one task executes at any instant.
Definition: Multiple tasks running simultaneously on different CPU cores. True simultaneous execution.
4 CPU Cores - Parallelism (Simultaneous Execution):
Time →
Core 1: ████████████████████████████
Core 2: ████████████████████████████
Core 3: ████████████████████████████
Core 4: ████████████████████████████
All tasks execute at the exact same instant.
Requires multiple physical CPU cores.
Concurrency is about structure (how you design your program).
Parallelism is about execution (how it runs on hardware).
You can have:
Examples: Java threads, C++ std::thread, Python threads (with limitations)
Application Thread 1 ←→ OS Thread 1 (Kernel scheduled)
Application Thread 2 ←→ OS Thread 2 (Kernel scheduled)
Application Thread 3 ←→ OS Thread 3 (Kernel scheduled)
- Each application thread maps to one OS thread
- OS kernel schedules threads across CPU cores
- Heavy: ~1-2MB stack per thread
- Context switch: ~1-2 microseconds (kernel involvement)
- Limit: ~thousands of threads before performance degrades
Examples: Old Python (pre-native threads), Ruby fibers
App Thread 1 ┐
App Thread 2 ├→ Single OS Thread (Runtime scheduled)
App Thread 3 ┘
- Many application threads, one OS thread
- Runtime/VM schedules threads (not OS kernel)
- Lightweight but can't use multiple cores
- Mostly obsolete (superseded by M:N model)
Examples: Go goroutines, Erlang processes, Tokio (Rust)
Goroutine 1 ┐
Goroutine 2 ├→ OS Thread 1 ┐
Goroutine 3 ┘ ├→ CPU Core 1
Goroutine 4 ┐ │
Goroutine 5 ├→ OS Thread 2 ─┤
Goroutine 6 ┘ ├→ CPU Core 2
Goroutine 7 ┐ │
Goroutine 8 ├→ OS Thread 3 ┘
Goroutine 9 ┘
M goroutines multiplexed onto N OS threads
- Runtime scheduler manages goroutines
- OS kernel schedules OS threads onto cores
- Best of both worlds: lightweight + multi-core
The GIL is a mutex (lock) that allows only ONE thread to execute Python bytecode at a time, even on multi-core systems.
What this means:
Historical Context (1991):
Attempts to Remove GIL:
GIL Release Conditions:
import threading
import time
def cpu_bound_task(n):
"""CPU-intensive: calculate sum (holds GIL entire time)"""
count = 0
for i in range(n):
count += i * i
return count
# Sequential execution
start = time.time()
cpu_bound_task(10_000_000)
cpu_bound_task(10_000_000)
print(f"Sequential: {time.time() - start:.2f}s") # ~1.5s
# Multi-threaded (doesn't help due to GIL!)
start = time.time()
t1 = threading.Thread(target=cpu_bound_task, args=(10_000_000,))
t2 = threading.Thread(target=cpu_bound_task, args=(10_000_000,))
t1.start()
t2.start()
t1.join()
t2.join()
print(f"Multi-threaded: {time.time() - start:.2f}s") # ~1.5s (same or slower!)
# Threads fight for GIL, context switching overhead
# No speedup, possibly slower due to lock contention
import threading
import time
import requests
def fetch_url(url):
"""I/O-bound: network request (releases GIL during I/O)"""
response = requests.get(url)
return len(response.content)
urls = [
"https://www.google.com",
"https://www.github.com",
"https://www.stackoverflow.com",
"https://www.reddit.com"
]
# Sequential execution
start = time.time()
for url in urls:
fetch_url(url)
print(f"Sequential: {time.time() - start:.2f}s") # ~4s (1s per request)
# Multi-threaded (big speedup!)
start = time.time()
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Multi-threaded: {time.time() - start:.2f}s") # ~1s (parallel I/O)
# While waiting for network I/O, GIL is released
# Other threads can run, achieving concurrency
1. Multiprocessing (Separate Python Interpreters, No Shared GIL)
from multiprocessing import Pool
import time
def cpu_bound_task(n):
count = 0
for i in range(n):
count += i * i
return count
if __name__ == "__main__":
start = time.time()
# Use process pool (each process has own GIL)
with Pool(processes=2) as pool:
results = pool.map(cpu_bound_task, [10_000_000, 10_000_000])
print(f"Multiprocessing: {time.time() - start:.2f}s") # ~0.8s (2x speedup!)
# Each process runs on separate CPU core
# No GIL contention, true parallelism
2. Async/Await (Cooperative Concurrency, Single Thread)
import asyncio
import aiohttp
async def fetch_url(session, url):
"""Async I/O - single thread, many concurrent requests"""
async with session.get(url) as response:
return await response.text()
async def main():
urls = [
"https://www.google.com",
"https://www.github.com",",
"https://www.stackoverflow.com",
"https://www.reddit.com"
]
async with aiohttp.ClientSession() as session:
tasks = [fetch_url(session, url) for url in urls]
results = await asyncio.gather(*tasks)
print(f"Fetched {len(results)} pages")
# Single thread, cooperative multitasking
# While waiting for I/O, event loop runs other tasks
# More efficient than threads for I/O (no context switching overhead)
asyncio.run(main())
3. C Extensions / Cython (Release GIL)
# NumPy releases GIL for computations
import numpy as np
import threading
def compute_matrix():
# GIL released during NumPy operations
a = np.random.rand(1000, 1000)
b = np.random.rand(1000, 1000)
c = np.dot(a, b) # Matrix multiplication in C (no GIL)
return c
# These threads achieve parallelism (NumPy releases GIL)
threads = [threading.Thread(target=compute_matrix) for _ in range(4)]
for t in threads:
t.start()
for t in threads:
t.join()
| Use Case | Why Threading Works |
|---|---|
| Network I/O | GIL released during socket operations (requests, aiohttp) |
| Disk I/O | GIL released during file read/write |
| Database queries | GIL released while waiting for DB response |
| Sleep/Delays | time.sleep() releases GIL |
| NumPy/SciPy | C extensions release GIL during computation |
Goroutine = Lightweight thread managed by Go runtime, not OS.
Go Scheduler Components:
G = Goroutine (lightweight thread)
M = Machine (OS thread)
P = Processor (logical CPU, schedules goroutines)
Architecture:
G1 G2 G3 G4 G5 G6 G7 G8 G9
↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓
[P1 Queue] [P2 Queue] [P3 Queue]
↓ ↓ ↓
M1 M2 M3
↓ ↓ ↓
CPU Core 1 CPU Core 2 CPU Core 3
GOMAXPROCS = Number of P's (default: number of CPU cores)
How it works:
1. Each P has a run queue of goroutines
2. Each P is bound to an M (OS thread)
3. M executes goroutines from P's queue
4. When goroutine blocks (I/O, syscall), M detaches from P
5. P finds/creates another M to keep running goroutines
6. Work stealing: Idle P steals goroutines from busy P's queue
package main
import (
"fmt"
"time"
)
func task(id int) {
for i := 0; i < 5; i++ {
fmt.Printf("Task %d: %d\n", id, i)
time.Sleep(100 * time.Millisecond)
}
}
func main() {
// Launch 3 goroutines (just add "go" keyword!)
go task(1)
go task(2)
go task(3)
// Wait for goroutines to finish
time.Sleep(1 * time.Second)
fmt.Println("Main done")
}
// Output (interleaved):
// Task 1: 0
// Task 3: 0
// Task 2: 0
// Task 1: 1
// Task 2: 1
// Task 3: 1
// ...
// Creating goroutines is EXTREMELY cheap
// Can create millions without issue
package main
import (
"fmt"
"runtime"
"sync"
"time"
)
func cpuBoundTask(n int, wg *sync.WaitGroup) {
defer wg.Done()
count := 0
for i := 0; i < n; i++ {
count += i * i
}
}
func main() {
fmt.Printf("CPU cores: %d\n", runtime.NumCPU())
runtime.GOMAXPROCS(runtime.NumCPU()) // Use all cores (default in modern Go)
n := 100_000_000
// Sequential execution
start := time.Now()
var wg sync.WaitGroup
wg.Add(1)
cpuBoundTask(n, &wg)
wg.Wait()
fmt.Printf("Sequential: %v\n", time.Since(start)) // ~100ms
// Parallel execution with 4 goroutines
start = time.Now()
wg.Add(4)
go cpuBoundTask(n/4, &wg)
go cpuBoundTask(n/4, &wg)
go cpuBoundTask(n/4, &wg)
go cpuBoundTask(n/4, &wg)
wg.Wait()
fmt.Printf("Parallel (4 goroutines): %v\n", time.Since(start)) // ~25ms (4x speedup!)
// TRUE parallelism - no GIL!
// All 4 goroutines run simultaneously on different CPU cores
}
"Don't communicate by sharing memory; share memory by communicating." - Go proverb
package main
import (
"fmt"
"time"
)
func producer(ch chan int) {
for i := 0; i < 5; i++ {
fmt.Printf("Producing: %d\n", i)
ch <- i // Send to channel (blocks if channel full)
time.Sleep(100 * time.Millisecond)
}
close(ch) // Signal no more data
}
func consumer(ch chan int) {
for value := range ch { // Receive until channel closed
fmt.Printf("Consumed: %d\n", value)
}
}
func main() {
ch := make(chan int, 2) // Buffered channel (capacity 2)
go producer(ch)
go consumer(ch)
time.Sleep(1 * time.Second)
}
// Channels are typed, thread-safe queues
// Avoid shared memory + locks (common source of bugs)
// Compiler catches many concurrency errors at compile time
package main
import (
"fmt"
"time"
)
func main() {
ch1 := make(chan string)
ch2 := make(chan string)
go func() {
time.Sleep(100 * time.Millisecond)
ch1 <- "Message from channel 1"
}()
go func() {
time.Sleep(200 * time.Millisecond)
ch2 <- "Message from channel 2"
}()
// select waits on multiple channels
for i := 0; i < 2; i++ {
select {
case msg1 := <-ch1:
fmt.Println(msg1)
case msg2 := <-ch2:
fmt.Println(msg2)
case <-time.After(300 * time.Millisecond):
fmt.Println("Timeout")
}
}
}
// select is like switch for channels
// Blocks until one channel is ready
// If multiple ready, picks randomly (fair)
Work Stealing (Load Balancing):
P1 Queue: [G1, G2, G3, G4, G5, G6] (6 goroutines, busy)
P2 Queue: [G7] (1 goroutine, mostly idle)
When P2 finishes G7:
1. P2 checks own queue (empty)
2. P2 "steals" half of P1's queue
3. P2 Queue: [G4, G5, G6] (stolen from P1)
P1 Queue: [G1, G2, G3]
Result: Balanced load across all cores
Preemption (No Infinite Loops):
Go 1.14+ has asynchronous preemption
Goroutine running CPU-bound loop can be preempted
Prevents one goroutine from hogging P forever
| Feature | Python (GIL) | Go (Goroutines) |
|---|---|---|
| Threading Model | 1:1 (OS threads) with GIL lock | M:N (goroutines multiplexed on OS threads) |
| True Parallelism | ❌ No (GIL prevents concurrent Python code) | ✅ Yes (no GIL, uses all CPU cores) |
| CPU-Bound Tasks | Threading doesn't help. Use multiprocessing. | Goroutines achieve linear speedup (N cores = Nx faster) |
| I/O-Bound Tasks | ✅ Threading works (GIL released during I/O) | ✅ Goroutines work great |
| Memory per Thread | ~1-2 MB (OS thread stack) | ~2 KB (goroutine stack, grows dynamically) |
| Context Switch Cost | ~1-2 µs (kernel mode switch) | ~10-100 ns (userspace switch) |
| Max Practical Threads | ~1,000-10,000 threads | Millions of goroutines |
| Creation Syntax | threading.Thread(target=func).start() |
go func() (single keyword!) |
| Communication | Shared memory + locks (error-prone) | Channels (type-safe, compiler-checked) |
| Workarounds | Multiprocessing (separate processes, IPC overhead) | Not needed (goroutines just work) |
| Garbage Collection | Reference counting + cycle detector (GIL simplifies this) | Concurrent mark-and-sweep GC (sub-ms pause times) |
| Async Alternative | asyncio (single thread, cooperative) | Goroutines ARE async (runtime-managed) |
Task: Calculate sum of squares (100M iterations)
Sequential: 1.5s
Threading (2 threads): 1.5s ❌ No speedup
Threading (4 threads): 1.6s ❌ Slower (contention)
Threads fight for GIL, context switching overhead
Only one thread executes at a time
Task: Calculate sum of squares (100M iterations)
Sequential: 1.5s
Goroutines (2): 0.75s ✅ 2x speedup
Goroutines (4): 0.38s ✅ 4x speedup
Linear scaling with CPU cores
True parallel execution
| Scenario | Python Approach | Go Approach |
|---|---|---|
| Web scraping (I/O-bound) | ✅ Threading or asyncio | ✅ Goroutines |
| Web server (I/O-bound) | ✅ async (FastAPI, aiohttp) | ✅ Goroutines (net/http) |
| Image processing (CPU-bound) | ⚠️ Multiprocessing (process overhead) | ✅ Goroutines (near-linear scaling) |
| Data pipeline (mixed I/O + CPU) | ⚠️ Mix of threading + multiprocessing (complex) | ✅ Goroutines (handles both naturally) |
| Machine learning (CPU-heavy) | ✅ NumPy/PyTorch (release GIL in C/CUDA) | ⚠️ Use Python ecosystem (more mature) |
| Microservices (many concurrent connections) | ⚠️ asyncio (complex) or gunicorn workers | ✅ Goroutines (designed for this) |
When multiple threads access shared data, the result depends on the timing of thread execution. This is unpredictable and causes bugs.
# Example: Race condition
counter = 0
def increment():
global counter
for _ in range(1_000_000):
counter += 1 # Not atomic! Read, add, write
# Two threads
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()
print(counter) # Expected: 2,000,000
# Actual: ~1,234,567 (varies each run!)
# Why? counter += 1 is three operations:
# 1. Read counter value
# 2. Add 1
# 3. Write back
# Threads interleave these operations!
Thread 1: Read (0) → Add → [INTERRUPTED]
Thread 2: Read (0) → Add → Write (1)
Thread 1: Write (1) ← Lost Thread 2's increment!
Mutex = Mutual Exclusion. Only one thread can hold the lock at a time.
import threading
counter = 0
lock = threading.Lock()
def increment():
global counter
for _ in range(1_000_000):
with lock: # Acquire lock
counter += 1
# Lock automatically released
# Now safe!
t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)
t1.start(); t2.start()
t1.join(); t2.join()
print(counter) # Always: 2,000,000 ✓
# Alternative syntax:
# lock.acquire()
# try:
# counter += 1
# finally:
# lock.release()
package main
import (
"fmt"
"sync"
)
func main() {
var counter int
var mu sync.Mutex
var wg sync.WaitGroup
for i := 0; i < 2; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for j := 0; j < 1_000_000; j++ {
mu.Lock()
counter++
mu.Unlock()
}
}()
}
wg.Wait()
fmt.Println(counter) // Always: 2,000,000
}
Locks have overhead! Fine-grained locking (lock per operation) can be slower than no concurrency.
# Sequential (no lock): 0.1s
# Two threads with fine-grained locks: 2.5s (slower!)
# Coarse-grained locks (lock larger chunks): 0.8s (better)
# Lesson: Lock granularity matters
A lock that can be acquired multiple times by the same thread.
import threading
lock = threading.RLock() # Reentrant Lock
def outer():
with lock:
print("Outer acquired lock")
inner() # Can acquire same lock again!
def inner():
with lock: # Same thread, same lock - OK!
print("Inner acquired lock")
# Regular Lock would deadlock here
# RLock allows same thread to re-acquire
Like a lock, but allows N threads to access resource simultaneously. Think of it as a counter.
import threading
import time
# Allow max 3 concurrent database connections
db_semaphore = threading.Semaphore(3)
def query_database(query_id):
print(f"Query {query_id}: Waiting for connection...")
with db_semaphore: # Acquire (counter--)
print(f"Query {query_id}: Connected! Executing...")
time.sleep(2) # Simulate query
print(f"Query {query_id}: Done")
# Release (counter++)
# Launch 10 queries
threads = [threading.Thread(target=query_database, args=(i,)) for i in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
# Only 3 execute at once (waves of 3, 3, 3, 1)
# Semaphore mechanics:
# - Internal counter starts at N (3 in this case)
# - acquire() decrements counter, blocks if counter = 0
# - release() increments counter, wakes waiting thread
# Binary Semaphore (count=1) ≈ Lock
sem = threading.Semaphore(1)
# Difference: Semaphore can be released by different thread
# Lock must be released by thread that acquired it
Allow threads to wait for a condition to become true. Used for thread communication.
import threading
import time
import random
queue = []
MAX_SIZE = 5
condition = threading.Condition()
def producer():
for i in range(10):
time.sleep(random.uniform(0.1, 0.5))
with condition:
# Wait if queue is full
while len(queue) >= MAX_SIZE:
print(f"Producer: Queue full, waiting...")
condition.wait() # Release lock, sleep, reacquire when notified
item = f"Item-{i}"
queue.append(item)
print(f"Producer: Produced {item}, queue size: {len(queue)}")
condition.notify() # Wake up one waiting consumer
def consumer():
for i in range(10):
time.sleep(random.uniform(0.2, 0.8))
with condition:
# Wait if queue is empty
while len(queue) == 0:
print(f"Consumer: Queue empty, waiting...")
condition.wait()
item = queue.pop(0)
print(f"Consumer: Consumed {item}, queue size: {len(queue)}")
condition.notify() # Wake up waiting producer
t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)
t1.start(); t2.start()
t1.join(); t2.join()
# Condition variables = Lock + Wait/Notify mechanism
# wait(): Atomically release lock and sleep
# notify(): Wake one waiting thread
# notify_all(): Wake all waiting threads
Optimize for read-heavy workloads. Multiple readers OR one writer.
from threading import Thread, Lock
import time
# Python doesn't have built-in RWLock, but here's the concept:
class ReadWriteLock:
def __init__(self):
self._readers = 0
self._writers = 0
self._read_ready = threading.Condition(threading.Lock())
self._write_ready = threading.Condition(threading.Lock())
def acquire_read(self):
with self._read_ready:
while self._writers > 0:
self._read_ready.wait()
self._readers += 1
def release_read(self):
with self._read_ready:
self._readers -= 1
if self._readers == 0:
self._write_ready.notify_all()
def acquire_write(self):
with self._write_ready:
while self._writers > 0 or self._readers > 0:
self._write_ready.wait()
self._writers += 1
def release_write(self):
with self._write_ready:
self._writers -= 1
self._write_ready.notify_all()
self._read_ready.notify_all()
# Usage: Many readers can read simultaneously
# Writer gets exclusive access
package main
import (
"fmt"
"sync"
"time"
)
func main() {
var rwMu sync.RWMutex
data := make(map[string]int)
// Multiple readers (10 concurrent readers)
for i := 0; i < 10; i++ {
go func(id int) {
rwMu.RLock() // Read lock (shared)
defer rwMu.RUnlock()
fmt.Printf("Reader %d: %v\n", id, data)
time.Sleep(100 * time.Millisecond)
}(i)
}
// Single writer (exclusive)
go func() {
rwMu.Lock() // Write lock (exclusive)
defer rwMu.Unlock()
data["key"] = 42
fmt.Println("Writer: Updated data")
}()
time.Sleep(2 * time.Second)
}
Lock-free operations guaranteed to execute atomically (no interruption).
# Python's GIL makes some operations accidentally atomic:
# x = 1 (atomic - single bytecode instruction)
# x += 1 (NOT atomic - multiple bytecode instructions)
# For true atomics, use multiprocessing.Value or ctypes
from multiprocessing import Value
from threading import Thread
counter = Value('i', 0) # Shared integer with lock
def increment():
for _ in range(100_000):
with counter.get_lock():
counter.value += 1
# More common: Just use threading.Lock
package main
import (
"fmt"
"sync"
"sync/atomic"
)
func main() {
var counter int64
var wg sync.WaitGroup
// Lock-free atomic increment
for i := 0; i < 1000; i++ {
wg.Add(1)
go func() {
defer wg.Done()
atomic.AddInt64(&counter, 1) // Atomic operation
}()
}
wg.Wait()
fmt.Println(counter) // Always: 1000
// Other atomic operations:
// atomic.LoadInt64(&counter) // Atomic read
// atomic.StoreInt64(&counter, 5) // Atomic write
// atomic.CompareAndSwapInt64(&counter, old, new) // CAS
}
// Atomics are faster than locks (no kernel involvement)
// But limited to simple operations (add, load, store, CAS)
Synchronization point where all threads must wait until everyone arrives.
from threading import Thread, Barrier
import time
import random
# 3 threads must reach barrier before any continue
barrier = Barrier(3)
def worker(worker_id):
print(f"Worker {worker_id}: Starting phase 1")
time.sleep(random.uniform(1, 3)) # Simulate work
print(f"Worker {worker_id}: Finished phase 1, waiting at barrier")
barrier.wait() # Block until all 3 threads reach here
print(f"Worker {worker_id}: All workers ready, starting phase 2")
time.sleep(random.uniform(1, 2))
print(f"Worker {worker_id}: Finished phase 2")
threads = [Thread(target=worker, args=(i,)) for i in range(3)]
for t in threads:
t.start()
for t in threads:
t.join()
# Output shows all workers wait at barrier before proceeding
# Useful for phased algorithms (e.g., parallel sorting)
Two threads waiting for each other forever.
lock1 = threading.Lock()
lock2 = threading.Lock()
def thread1():
with lock1:
time.sleep(0.1) # Give thread2 time to acquire lock2
with lock2: # Wait forever - thread2 holds lock2
print("Thread 1")
def thread2():
with lock2:
time.sleep(0.1)
with lock1: # Wait forever - thread1 holds lock1
print("Thread 2")
# DEADLOCK! Both threads wait forever
# Prevention:
# 1. Lock ordering: Always acquire locks in same order
# 2. Timeout: lock.acquire(timeout=1)
# 3. Lock hierarchy: Assign levels to locks, acquire lower→higher
# 4. Avoid nested locks when possible
Threads keep changing state in response to each other but make no progress.
# Two people in hallway, both stepping same direction
# Both politely step aside... in same direction again!
# Not blocked, but not progressing
# Prevention: Randomized backoff, priority schemes
A thread never gets access to a resource (unfair scheduling).
# Low-priority thread never runs because high-priority threads
# constantly acquire lock
# Prevention: Fair locks, priority inheritance
| Principle | Explanation |
|---|---|
| Minimize shared state | Less shared data = fewer synchronization points. Use thread-local storage, immutability. |
| Coarse-grained locks | Lock larger chunks of work, not every tiny operation. Reduces lock overhead. |
| Lock ordering | Always acquire multiple locks in same order to prevent deadlock. |
| Short critical sections | Hold locks for minimum time needed. Don't do I/O while holding lock. |
| Use higher-level primitives | Queue, ThreadPoolExecutor (Python), channels (Go) instead of raw locks. |
| Prefer message passing | Go channels, actor model - communicate by sharing, don't share memory. |
| Test with ThreadSanitizer | Tools like TSan (C++/Go), pytest-xdist detect race conditions. |
| Language | Model | True Parallelism | Notes |
|---|---|---|---|
| Java | 1:1 OS threads | ✅ Yes | No GIL, heavy threads, good for CPU-bound |
| C# (.NET) | 1:1 + async/await | ✅ Yes | No GIL, threadpool + task-based async |
| Rust | 1:1 OS threads + async (Tokio) | ✅ Yes | No GIL, fearless concurrency (compiler enforces safety) |
| JavaScript (Node.js) | Single-threaded + event loop | ❌ No (but Worker threads available) | async/await, non-blocking I/O, callbacks |
| Ruby (MRI) | 1:1 OS threads with GIL | ❌ No (same as Python) | GIL prevents parallel CPU execution |
| Erlang/Elixir | M:N (lightweight processes) | ✅ Yes | Similar to Go, millions of processes, message passing |
| C/C++ | 1:1 OS threads (pthreads, std::thread) | ✅ Yes | Manual memory management, race conditions are your problem |