Java Collections Framework

Understanding the right data structure for the job - and why it matters at scale

The Collections Hierarchy

Iterable<E>
    └── Collection<E>
            ├── List<E>          // Ordered, allows duplicates
            │     ├── ArrayList
            │     ├── LinkedList
            │     └── Vector (legacy)
            │
            ├── Set<E>           // No duplicates
            │     ├── HashSet
            │     ├── LinkedHashSet
            │     └── TreeSet (SortedSet)
            │
            └── Queue<E>         // FIFO (usually)
                  ├── LinkedList
                  ├── PriorityQueue
                  └── Deque
                        ├── ArrayDeque
                        └── LinkedList

Map<K,V>                         // Key-Value pairs (not Collection!)
    ├── HashMap
    ├── LinkedHashMap
    ├── TreeMap (SortedMap)
    ├── Hashtable (legacy)
    └── ConcurrentHashMap

Lists: Ordered Collections

# Python list - dynamic array, like ArrayList
names = ["Alice", "Bob", "Charlie"]

names.append("Diana")       # Add at end: O(1) amortized
names.insert(0, "Eve")      # Add at start: O(n)
names[2]                     # Access by index: O(1)
names.pop(0)                 # Remove first: O(n)
"Alice" in names            # Search: O(n)

# Python list is always mutable
// ArrayList - dynamic array (most common)
// Best for: random access, iteration, add at end

List<String> names = new ArrayList<>();

names.add("Diana");              // Add at end: O(1) amortized
names.add(0, "Eve");            // Add at start: O(n) - shifts all elements
names.get(2);                    // Access by index: O(1)
names.remove(0);                // Remove first: O(n)
names.contains("Alice");        // Search: O(n)

// Immutable list (Java 9+)
List<String> immutable = List.of("Alice", "Bob");
immutable.add("Charlie");  // UnsupportedOperationException!

// Unmodifiable wrapper
List<String> readOnly = Collections.unmodifiableList(names);
// LinkedList - doubly-linked list
// Best for: frequent add/remove at beginning/end, queue operations

LinkedList<String> names = new LinkedList<>();

names.addFirst("Eve");           // Add at start: O(1)
names.addLast("Diana");          // Add at end: O(1)
names.get(2);                    // Access by index: O(n) - must traverse!
names.removeFirst();             // Remove first: O(1)
names.removeLast();              // Remove last: O(1)

// Also implements Deque (double-ended queue)
names.push("stack item");        // Stack push
names.pop();                     // Stack pop
names.offer("queue item");       // Queue offer
names.poll();                    // Queue poll

ArrayList vs LinkedList - When to Use Which

OperationArrayListLinkedListWinner
Get by indexO(1)O(n)ArrayList
Add at endO(1)*O(1)Tie
Add at beginningO(n)O(1)LinkedList
Add in middleO(n)O(n)**Depends
Memory overheadLowHigh (node objects)ArrayList
Cache localityExcellentPoorArrayList

* Amortized - occasionally O(n) for resize. ** O(1) for insert, but O(n) to find position.

Practical Advice

Use ArrayList by default. LinkedList's theoretical advantages rarely matter in practice due to cache effects and memory overhead. Only use LinkedList when you need Deque operations or genuinely add/remove frequently at the head.

Sets: No Duplicates

# Python set - unordered, unique elements
names = {"Alice", "Bob", "Alice"}  # {"Alice", "Bob"}

names.add("Charlie")         # Add: O(1)
"Alice" in names             # Contains: O(1)
names.remove("Bob")          # Remove: O(1)

# Set operations
a = {1, 2, 3}
b = {2, 3, 4}
a | b  # Union: {1, 2, 3, 4}
a & b  # Intersection: {2, 3}
a - b  # Difference: {1}
// HashSet - unordered, backed by HashMap
// Best for: fast membership testing, deduplication

Set<String> names = new HashSet<>();
names.add("Alice");
names.add("Bob");
names.add("Alice");  // Ignored - already exists
// names = {"Alice", "Bob"} - order not guaranteed

names.add("Charlie");         // Add: O(1)
names.contains("Alice");      // Contains: O(1)
names.remove("Bob");          // Remove: O(1)

// Set operations (from Guava or manual)
Set<Integer> a = Set.of(1, 2, 3);
Set<Integer> b = Set.of(2, 3, 4);

// Union
Set<Integer> union = new HashSet<>(a);
union.addAll(b);  // {1, 2, 3, 4}

// Intersection
Set<Integer> intersection = new HashSet<>(a);
intersection.retainAll(b);  // {2, 3}

// Difference
Set<Integer> difference = new HashSet<>(a);
difference.removeAll(b);  // {1}
// TreeSet - sorted, backed by Red-Black tree
// Best for: sorted data, range queries

Set<String> names = new TreeSet<>();
names.add("Charlie");
names.add("Alice");
names.add("Bob");
// Iteration order: Alice, Bob, Charlie (sorted!)

// O(log n) for all operations (balanced tree)
names.add("Diana");           // O(log n)
names.contains("Alice");      // O(log n)

// NavigableSet operations (TreeSet specific)
TreeSet<Integer> numbers = new TreeSet<>(List.of(1, 5, 10, 15, 20));

numbers.first();              // 1 (smallest)
numbers.last();               // 20 (largest)
numbers.floor(12);            // 10 (largest <= 12)
numbers.ceiling(12);          // 15 (smallest >= 12)
numbers.subSet(5, 15);        // [5, 10) - range view
numbers.headSet(10);          // [1, 5) - elements < 10
numbers.tailSet(10);          // [10, 15, 20] - elements >= 10

LinkedHashSet - Insertion Order

// Maintains insertion order (unlike HashSet)
Set<String> names = new LinkedHashSet<>();
names.add("Charlie");
names.add("Alice");
names.add("Bob");
// Iteration order: Charlie, Alice, Bob (insertion order preserved)

Maps: Key-Value Storage

# Python dict - ordered (3.7+), unique keys
ages = {"Alice": 30, "Bob": 25}

ages["Charlie"] = 35          # Add/update: O(1)
ages.get("Diana", 0)          # Get with default: O(1)
ages["Alice"]                 # Get (KeyError if missing): O(1)
del ages["Bob"]               # Remove: O(1)
"Alice" in ages              # Contains key: O(1)

# Iteration
for key in ages:
    print(key, ages[key])

for key, value in ages.items():
    print(key, value)
// HashMap - unordered, null keys/values allowed
Map<String, Integer> ages = new HashMap<>();

ages.put("Alice", 30);          // Add: O(1)
ages.put("Bob", 25);
ages.put("Alice", 31);          // Update: replaces value

ages.get("Alice");              // Get (null if missing): O(1)
ages.getOrDefault("Diana", 0);  // Get with default: O(1)
ages.remove("Bob");             // Remove: O(1)
ages.containsKey("Alice");      // Contains key: O(1)
ages.containsValue(30);         // Contains value: O(n)!

// Iteration
for (String key : ages.keySet()) {
    System.out.println(key + ": " + ages.get(key));
}

for (Map.Entry<String, Integer> entry : ages.entrySet()) {
    System.out.println(entry.getKey() + ": " + entry.getValue());
}

// Java 8+ forEach
ages.forEach((key, value) -> System.out.println(key + ": " + value));

// Compute methods (atomic update)
ages.compute("Alice", (k, v) -> v == null ? 1 : v + 1);  // Increment
ages.computeIfAbsent("Diana", k -> 0);                   // Default if missing
ages.merge("Alice", 1, Integer::sum);                    // Merge with existing

Map Variants

TypeOrderNull KeysThread-SafeUse Case
HashMapNoneYes (1)NoGeneral purpose
LinkedHashMapInsertionYes (1)NoPredictable iteration
TreeMapSortedNoNoSorted keys, range queries
ConcurrentHashMapNoneNoYesMulti-threaded access
EnumMapEnum orderNoNoEnum keys (very fast)

Queues and Deques

// Queue - FIFO (First In, First Out)
Queue<String> queue = new LinkedList<>();
queue.offer("first");   // Add to tail
queue.offer("second");
queue.peek();           // View head (null if empty)
queue.poll();           // Remove head (null if empty)

// Deque - Double-ended queue (also a Stack!)
Deque<String> deque = new ArrayDeque<>();  // Preferred over LinkedList

// Queue operations
deque.offerLast("tail");     // Add to tail
deque.pollFirst();           // Remove from head

// Stack operations (LIFO - Last In, First Out)
deque.push("top");           // Add to head (same as offerFirst)
deque.pop();                 // Remove from head
deque.peek();                // View head

// DON'T use java.util.Stack (legacy, synchronized)
// Use ArrayDeque for stack operations
// PriorityQueue - elements ordered by priority
// Backed by heap - not sorted, but min/max always at head

// Natural ordering (min-heap)
PriorityQueue<Integer> minHeap = new PriorityQueue<>();
minHeap.offer(5);
minHeap.offer(1);
minHeap.offer(3);
minHeap.poll();  // 1 (smallest)
minHeap.poll();  // 3
minHeap.poll();  // 5

// Max-heap using Comparator
PriorityQueue<Integer> maxHeap = new PriorityQueue<>(Comparator.reverseOrder());
maxHeap.offer(5);
maxHeap.offer(1);
maxHeap.offer(3);
maxHeap.poll();  // 5 (largest)

// Custom priority
PriorityQueue<Task> taskQueue = new PriorityQueue<>(
    Comparator.comparing(Task::getPriority)
              .thenComparing(Task::getCreatedAt)
);

// Time complexity:
// offer(): O(log n)
// poll(): O(log n)
// peek(): O(1)

Immutable Collections (Java 9+)

// Immutable collections - cannot be modified after creation

// List.of() - immutable list
List<String> list = List.of("a", "b", "c");
list.add("d");  // UnsupportedOperationException!

// Set.of() - immutable set
Set<String> set = Set.of("a", "b", "c");

// Map.of() - immutable map (up to 10 entries)
Map<String, Integer> map = Map.of(
    "one", 1,
    "two", 2,
    "three", 3
);

// Map.ofEntries() - for more entries
Map<String, Integer> bigMap = Map.ofEntries(
    Map.entry("one", 1),
    Map.entry("two", 2),
    // ... many more
);

// Creating immutable copy of existing collection
List<String> mutable = new ArrayList<>(List.of("a", "b"));
List<String> immutable = List.copyOf(mutable);  // Immutable copy
mutable.add("c");  // OK
// immutable is unchanged

Why Immutable Collections Matter

Streams: Functional Collection Processing

numbers = [1, 2, 3, 4, 5]

# Filter, map, reduce
result = [x * 2 for x in numbers if x % 2 == 0]  # [4, 8]

# Or with functional style
from functools import reduce
evens = filter(lambda x: x % 2 == 0, numbers)
doubled = map(lambda x: x * 2, evens)
result = list(doubled)  # [4, 8]

# Sum
total = sum(numbers)

# Group by
from itertools import groupby
grouped = {k: list(v) for k, v in groupby(sorted(items, key=get_key), get_key)}
List<Integer> numbers = List.of(1, 2, 3, 4, 5);

// Filter, map, collect
List<Integer> result = numbers.stream()
    .filter(x -> x % 2 == 0)      // Keep evens
    .map(x -> x * 2)              // Double them
    .toList();                   // [4, 8]

// Sum
int total = numbers.stream()
    .mapToInt(Integer::intValue)
    .sum();  // 15

// Find first matching
Optional<Integer> first = numbers.stream()
    .filter(x -> x > 3)
    .findFirst();  // Optional[4]

// Group by
Map<Boolean, List<Integer>> partitioned = numbers.stream()
    .collect(Collectors.partitioningBy(x -> x % 2 == 0));
// {false=[1, 3, 5], true=[2, 4]}

// Group by custom key
Map<String, List<User>> byDepartment = users.stream()
    .collect(Collectors.groupingBy(User::getDepartment));

// Parallel stream (automatic parallelization)
long count = numbers.parallelStream()
    .filter(x -> isPrime(x))
    .count();

Choosing the Right Collection

NeedUseWhy
Ordered list, random accessArrayListO(1) access, cache-friendly
Frequent add/remove at endsArrayDequeO(1) at both ends
Fast contains checkHashSetO(1) lookup
Sorted unique elementsTreeSetO(log n), sorted iteration
Key-value pairsHashMapO(1) get/put
Sorted key-value pairsTreeMapO(log n), sorted by key
Insertion-ordered mapLinkedHashMapPredictable iteration order
Thread-safe mapConcurrentHashMapHigh concurrent throughput
Priority-based processingPriorityQueueO(log n) insert, O(1) min/max access