Collections Framework - Java for Architects

Understanding the right data structure for the job - and why it matters at scale

The Collections Hierarchy

Iterable<E>
    └── Collection<E>
            ├── List<E>          // Ordered, allows duplicates
            │     ├── ArrayList
            │     ├── LinkedList
            │     └── Vector (legacy)
            │
            ├── Set<E>           // No duplicates
            │     ├── HashSet
            │     ├── LinkedHashSet
            │     └── TreeSet (SortedSet)
            │
            └── Queue<E>         // FIFO (usually)
                  ├── LinkedList
                  ├── PriorityQueue
                  └── Deque
                        ├── ArrayDeque
                        └── LinkedList

Map<K,V>                         // Key-Value pairs (not Collection!)
    ├── HashMap
    ├── LinkedHashMap
    ├── TreeMap (SortedMap)
    ├── Hashtable (legacy)
    └── ConcurrentHashMap

Lists: Ordered Collections

# Python list - dynamic array, like ArrayList
names = ["Alice", "Bob", "Charlie"]

names.append("Diana")       # Add at end: O(1) amortized
names.insert(0, "Eve")      # Add at start: O(n)
names[2]                     # Access by index: O(1)
names.pop(0)                 # Remove first: O(n)
"Alice" in names            # Search: O(n)

# Python list is always mutable

// ArrayList - dynamic array (most common)
// Best for: random access, iteration, add at end

List<String> names = new ArrayList<>();

names.add("Diana");              // Add at end: O(1) amortized
names.add(0, "Eve");            // Add at start: O(n) - shifts all elements
names.get(2);                    // Access by index: O(1)
names.remove(0);                // Remove first: O(n)
names.contains("Alice");        // Search: O(n)

// Immutable list (Java 9+)
List<String> immutable = List.of("Alice", "Bob");
immutable.add("Charlie");  // UnsupportedOperationException!

// Unmodifiable wrapper
List<String> readOnly = Collections.unmodifiableList(names);

// LinkedList - doubly-linked list
// Best for: frequent add/remove at beginning/end, queue operations

LinkedList<String> names = new LinkedList<>();

names.addFirst("Eve");           // Add at start: O(1)
names.addLast("Diana");          // Add at end: O(1)
names.get(2);                    // Access by index: O(n) - must traverse!
names.removeFirst();             // Remove first: O(1)
names.removeLast();              // Remove last: O(1)

// Also implements Deque (double-ended queue)
names.push("stack item");        // Stack push
names.pop();                     // Stack pop
names.offer("queue item");       // Queue offer
names.poll();                    // Queue poll

ArrayList vs LinkedList - When to Use Which

Operation	ArrayList	LinkedList	Winner
Get by index	O(1)	O(n)	ArrayList
Add at end	O(1)*	O(1)	Tie
Add at beginning	O(n)	O(1)	LinkedList
Add in middle	O(n)	O(n)**	Depends
Memory overhead	Low	High (node objects)	ArrayList
Cache locality	Excellent	Poor	ArrayList

* Amortized - occasionally O(n) for resize. ** O(1) for insert, but O(n) to find position.

Sets: No Duplicates

# Python set - unordered, unique elements
names = {"Alice", "Bob", "Alice"}  # {"Alice", "Bob"}

names.add("Charlie")         # Add: O(1)
"Alice" in names             # Contains: O(1)
names.remove("Bob")          # Remove: O(1)

# Set operations
a = {1, 2, 3}
b = {2, 3, 4}
a | b  # Union: {1, 2, 3, 4}
a & b  # Intersection: {2, 3}
a - b  # Difference: {1}

// HashSet - unordered, backed by HashMap
// Best for: fast membership testing, deduplication

Set<String> names = new HashSet<>();
names.add("Alice");
names.add("Bob");
names.add("Alice");  // Ignored - already exists
// names = {"Alice", "Bob"} - order not guaranteed

names.add("Charlie");         // Add: O(1)
names.contains("Alice");      // Contains: O(1)
names.remove("Bob");          // Remove: O(1)

// Set operations (from Guava or manual)
Set<Integer> a = Set.of(1, 2, 3);
Set<Integer> b = Set.of(2, 3, 4);

// Union
Set<Integer> union = new HashSet<>(a);
union.addAll(b);  // {1, 2, 3, 4}

// Intersection
Set<Integer> intersection = new HashSet<>(a);
intersection.retainAll(b);  // {2, 3}

// Difference
Set<Integer> difference = new HashSet<>(a);
difference.removeAll(b);  // {1}

// TreeSet - sorted, backed by Red-Black tree
// Best for: sorted data, range queries

Set<String> names = new TreeSet<>();
names.add("Charlie");
names.add("Alice");
names.add("Bob");
// Iteration order: Alice, Bob, Charlie (sorted!)

// O(log n) for all operations (balanced tree)
names.add("Diana");           // O(log n)
names.contains("Alice");      // O(log n)

// NavigableSet operations (TreeSet specific)
TreeSet<Integer> numbers = new TreeSet<>(List.of(1, 5, 10, 15, 20));

numbers.first();              // 1 (smallest)
numbers.last();               // 20 (largest)
numbers.floor(12);            // 10 (largest <= 12)
numbers.ceiling(12);          // 15 (smallest >= 12)
numbers.subSet(5, 15);        // [5, 10) - range view
numbers.headSet(10);          // [1, 5) - elements < 10
numbers.tailSet(10);          // [10, 15, 20] - elements >= 10

LinkedHashSet - Insertion Order

Maps: Key-Value Storage

# Python dict - ordered (3.7+), unique keys
ages = {"Alice": 30, "Bob": 25}

ages["Charlie"] = 35          # Add/update: O(1)
ages.get("Diana", 0)          # Get with default: O(1)
ages["Alice"]                 # Get (KeyError if missing): O(1)
del ages["Bob"]               # Remove: O(1)
"Alice" in ages              # Contains key: O(1)

# Iteration
for key in ages:
    print(key, ages[key])

for key, value in ages.items():
    print(key, value)

// HashMap - unordered, null keys/values allowed
Map<String, Integer> ages = new HashMap<>();

ages.put("Alice", 30);          // Add: O(1)
ages.put("Bob", 25);
ages.put("Alice", 31);          // Update: replaces value

ages.get("Alice");              // Get (null if missing): O(1)
ages.getOrDefault("Diana", 0);  // Get with default: O(1)
ages.remove("Bob");             // Remove: O(1)
ages.containsKey("Alice");      // Contains key: O(1)
ages.containsValue(30);         // Contains value: O(n)!

// Iteration
for (String key : ages.keySet()) {
    System.out.println(key + ": " + ages.get(key));
}

for (Map.Entry<String, Integer> entry : ages.entrySet()) {
    System.out.println(entry.getKey() + ": " + entry.getValue());
}

// Java 8+ forEach
ages.forEach((key, value) -> System.out.println(key + ": " + value));

// Compute methods (atomic update)
ages.compute("Alice", (k, v) -> v == null ? 1 : v + 1);  // Increment
ages.computeIfAbsent("Diana", k -> 0);                   // Default if missing
ages.merge("Alice", 1, Integer::sum);                    // Merge with existing

Map Variants

Queues and Deques

Type	Order	Null Keys	Thread-Safe	Use Case
`HashMap`	None	Yes (1)	No	General purpose
`LinkedHashMap`	Insertion	Yes (1)	No	Predictable iteration
`TreeMap`	Sorted	No	No	Sorted keys, range queries
`ConcurrentHashMap`	None	No	Yes	Multi-threaded access
`EnumMap`	Enum order	No	No	Enum keys (very fast)

// Queue - FIFO (First In, First Out)
Queue<String> queue = new LinkedList<>();
queue.offer("first");   // Add to tail
queue.offer("second");
queue.peek();           // View head (null if empty)
queue.poll();           // Remove head (null if empty)

// Deque - Double-ended queue (also a Stack!)
Deque<String> deque = new ArrayDeque<>();  // Preferred over LinkedList

// Queue operations
deque.offerLast("tail");     // Add to tail
deque.pollFirst();           // Remove from head

// Stack operations (LIFO - Last In, First Out)
deque.push("top");           // Add to head (same as offerFirst)
deque.pop();                 // Remove from head
deque.peek();                // View head

// DON'T use java.util.Stack (legacy, synchronized)
// Use ArrayDeque for stack operations

// PriorityQueue - elements ordered by priority
// Backed by heap - not sorted, but min/max always at head

// Natural ordering (min-heap)
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
minHeap.offer(5);
minHeap.offer(1);
minHeap.offer(3);
minHeap.poll();  // 1 (smallest)
minHeap.poll();  // 3
minHeap.poll();  // 5

// Max-heap using Comparator
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
maxHeap.offer(5);
maxHeap.offer(1);
maxHeap.offer(3);
maxHeap.poll();  // 5 (largest)

// Custom priority
PriorityQueue<Task> taskQueue = new PriorityQueue<>(
    Comparator.comparing(Task::getPriority)
              .thenComparing(Task::getCreatedAt)
);

// Time complexity:
// offer(): O(log n)
// poll(): O(log n)
// peek(): O(1)

Immutable Collections (Java 9+)

// Immutable collections - cannot be modified after creation

// List.of() - immutable list
List<String> list = List.of("a", "b", "c");
list.add("d");  // UnsupportedOperationException!

// Set.of() - immutable set
Set<String> set = Set.of("a", "b", "c");

// Map.of() - immutable map (up to 10 entries)
Map<String, Integer> map = Map.of(
    "one", 1,
    "two", 2,
    "three", 3
);

// Map.ofEntries() - for more entries
Map<String, Integer> bigMap = Map.ofEntries(
    Map.entry("one", 1),
    Map.entry("two", 2),
    // ... many more
);

// Creating immutable copy of existing collection
List<String> mutable = new ArrayList<>(List.of("a", "b"));
List<String> immutable = List.copyOf(mutable);  // Immutable copy
mutable.add("c");  // OK
// immutable is unchanged

Streams: Functional Collection Processing

numbers = [1, 2, 3, 4, 5]

# Filter, map, reduce
result = [x * 2 for x in numbers if x % 2 == 0]  # [4, 8]

# Or with functional style
from functools import reduce
evens = filter(lambda x: x % 2 == 0, numbers)
doubled = map(lambda x: x * 2, evens)
result = list(doubled)  # [4, 8]

# Sum
total = sum(numbers)

# Group by
from itertools import groupby
grouped = {k: list(v) for k, v in groupby(sorted(items, key=get_key), get_key)}

List<Integer> numbers = List.of(1, 2, 3, 4, 5);

// Filter, map, collect
List<Integer> result = numbers.stream()
    .filter(x -> x % 2 == 0)      // Keep evens
    .map(x -> x * 2)              // Double them
    .toList();                   // [4, 8]

// Sum
int total = numbers.stream()
    .mapToInt(Integer::intValue)
    .sum();  // 15

// Find first matching
Optional<Integer> first = numbers.stream()
    .filter(x -> x > 3)
    .findFirst();  // Optional[4]

// Group by
Map<Boolean, List<Integer>> partitioned = numbers.stream()
    .collect(Collectors.partitioningBy(x -> x % 2 == 0));
// {false=[1, 3, 5], true=[2, 4]}

// Group by custom key
Map<String, List<User>> byDepartment = users.stream()
    .collect(Collectors.groupingBy(User::getDepartment));

// Parallel stream (automatic parallelization)
long count = numbers.parallelStream()
    .filter(x -> isPrime(x))
    .count();

Need	Use	Why
Ordered list, random access	`ArrayList`	O(1) access, cache-friendly
Frequent add/remove at ends	`ArrayDeque`	O(1) at both ends
Fast contains check	`HashSet`	O(1) lookup
Sorted unique elements	`TreeSet`	O(log n), sorted iteration
Key-value pairs	`HashMap`	O(1) get/put
Sorted key-value pairs	`TreeMap`	O(log n), sorted by key
Insertion-ordered map	`LinkedHashMap`	Predictable iteration order
Thread-safe map	`ConcurrentHashMap`	High concurrent throughput
Priority-based processing	`PriorityQueue`	O(log n) insert, O(1) min/max access

Java Collections Framework