Tries & Disjoint Set Union

Lesson 11 · OKSTEM College · AS Computer Science

Trie (Prefix Tree)

A trie stores strings character by character. Each path from root to a marked node spells a word. Operations: insert, search, starts_with — all O(L) where L = word length.

class TrieNode: def __init__(self): self.children = {}; self.end = False class Trie: def __init__(self): self.root = TrieNode() def insert(self, word): node = self.root for ch in word: node = node.children.setdefault(ch, TrieNode()) node.end = True def search(self, word): node = self.root for ch in word: if ch not in node.children: return False node = node.children[ch] return node.end def starts_with(self, prefix): node = self.root for ch in prefix: if ch not in node.children: return False node = node.children[ch] return True

Real uses: autocomplete (phone keyboard, search bar), spell checkers, IP routing tables, word games (Boggle, Scrabble AI).

Disjoint Set Union (Union-Find)

DSU tracks which elements are in the same set. Supports find(x) (which set is x in?) and union(x, y) (merge sets). Used in Kruskal’s MST and connected components.

class DSU: def __init__(self, n): self.parent = list(range(n)) self.rank = [0] * n def find(self, x): # with path compression if self.parent[x] != x: self.parent[x] = self.find(self.parent[x]) return self.parent[x] def union(self, x, y): # by rank rx, ry = self.find(x), self.find(y) if rx == ry: return False # already same set if self.rank[rx] < self.rank[ry]: rx, ry = ry, rx self.parent[ry] = rx if self.rank[rx] == self.rank[ry]: self.rank[rx] += 1 return True

With path compression + union by rank: nearly O(1) per operation — technically O(α(n)) where α is the inverse Ackermann function.

Practice

1. Why is a trie faster than a hash set for prefix queries?

Hash set: to find all words with prefix "app", you'd iterate all stored words — O(n * L). Trie: navigate to the "app" node in O(3) = O(L) steps, then traverse the subtree. starts_with("app") is O(L) regardless of dictionary size.

2. How does DSU detect if adding an edge creates a cycle?

Before adding edge (u, v): call find(u) and find(v). If find(u) == find(v), they're already in the same component — adding the edge would create a cycle. If different, call union(u, v) to merge components. This is how Kruskal's MST works.

Knowledge Check

Trie search for a word of length L takes

You traverse L nodes — one per character.

Correct — follow one edge per character; total L edges regardless of dictionary size.

Trie search is independent of the dictionary size.

No binary search; follow child pointers directly.

Recap: trie operations are O(L) — proportional only to the string length, not the number of stored strings. This beats hash table for prefix queries.

A trie is most advantageous over a hash set when

Hash set gives O(1) exact lookup; trie gives O(L).

Correct — trie: starts_with('pre') in O(3). Hash set: must scan all keys.

Tries are designed for strings/sequences; hash sets handle integers well.

Tries use more memory than hash sets due to pointer-per-character overhead.

Recap: hash set wins for exact lookup. Trie wins for prefix queries, autocomplete, and spell checking.

DSU find() with path compression

find() returns the root/representative of the set, not the minimum.

Correct — each find() points nodes directly to the root, making future finds near-O(1).

Merging is union().

DSU doesn't maintain sorted order.

Recap: path compression during find(): after finding root, make every node on the path point directly to it. Flattens the tree so future finds are O(1) amortised.

Union by rank prevents

DSU doesn't deal with duplicates.

Correct — always attach the shorter tree under the taller, keeping depth O(log n).

They work together; rank prevents tall trees, compression flattens them further.

DSU supports any number of sets.

Recap: without rank, union could degrade to a linked list (O(n) find). Rank (or size) heuristic keeps tree height O(log n).

Kruskal's MST uses DSU to

Sorting uses a standard sort algorithm.

Correct — if both endpoints have the same root (find returns same), adding the edge creates a cycle.

A sorted edge list or min-heap handles that.

DSU stores component memberships, not vertices directly.

Recap: Kruskal's: sort edges, for each edge (u,v) — if find(u) != find(v), add edge and union(u,v). Otherwise skip (would form cycle). Runs in O(E log E).

← Previous Next →