A trie stores strings character by character. Each path from root to a marked node spells a word. Operations: insert, search, starts_with — all O(L) where L = word length.
classTrieNode:
def__init__(self):
self.children = {}; self.end = FalseclassTrie:
def__init__(self): self.root = TrieNode()
definsert(self, word):
node = self.root
for ch in word:
node = node.children.setdefault(ch, TrieNode())
node.end = Truedefsearch(self, word):
node = self.root
for ch in word:
if ch not in node.children: returnFalse
node = node.children[ch]
return node.end
defstarts_with(self, prefix):
node = self.root
for ch in prefix:
if ch not in node.children: returnFalse
node = node.children[ch]
returnTrue
Real uses: autocomplete (phone keyboard, search bar), spell checkers, IP routing tables, word games (Boggle, Scrabble AI).
Disjoint Set Union (Union-Find)
DSU tracks which elements are in the same set. Supports find(x) (which set is x in?) and union(x, y) (merge sets). Used in Kruskal’s MST and connected components.
classDSU:
def__init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
deffind(self, x): # with path compressionifself.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
returnself.parent[x]
defunion(self, x, y): # by rank
rx, ry = self.find(x), self.find(y)
if rx == ry: returnFalse# already same setifself.rank[rx] < self.rank[ry]: rx, ry = ry, rx
self.parent[ry] = rx
ifself.rank[rx] == self.rank[ry]: self.rank[rx] += 1returnTrue
With path compression + union by rank: nearly O(1) per operation — technically O(α(n)) where α is the inverse Ackermann function.
Practice
1. Why is a trie faster than a hash set for prefix queries?
Hash set: to find all words with prefix "app", you'd iterate all stored words — O(n * L).
Trie: navigate to the "app" node in O(3) = O(L) steps, then traverse the subtree.
starts_with("app") is O(L) regardless of dictionary size.
2. How does DSU detect if adding an edge creates a cycle?
Before adding edge (u, v): call find(u) and find(v).
If find(u) == find(v), they're already in the same component — adding the edge would create a cycle.
If different, call union(u, v) to merge components. This is how Kruskal's MST works.
Knowledge Check
Trie search for a word of length L takes
You traverse L nodes — one per character.
Correct — follow one edge per character; total L edges regardless of dictionary size.
Trie search is independent of the dictionary size.
No binary search; follow child pointers directly.
Recap: trie operations are O(L) — proportional only to the string length, not the number of stored strings. This beats hash table for prefix queries.
A trie is most advantageous over a hash set when
Hash set gives O(1) exact lookup; trie gives O(L).
Correct — trie: starts_with('pre') in O(3). Hash set: must scan all keys.
Tries are designed for strings/sequences; hash sets handle integers well.
Tries use more memory than hash sets due to pointer-per-character overhead.
Recap: hash set wins for exact lookup. Trie wins for prefix queries, autocomplete, and spell checking.
DSU find() with path compression
find() returns the root/representative of the set, not the minimum.
Correct — each find() points nodes directly to the root, making future finds near-O(1).
Merging is union().
DSU doesn't maintain sorted order.
Recap: path compression during find(): after finding root, make every node on the path point directly to it. Flattens the tree so future finds are O(1) amortised.
Union by rank prevents
DSU doesn't deal with duplicates.
Correct — always attach the shorter tree under the taller, keeping depth O(log n).
They work together; rank prevents tall trees, compression flattens them further.
DSU supports any number of sets.
Recap: without rank, union could degrade to a linked list (O(n) find). Rank (or size) heuristic keeps tree height O(log n).
Kruskal's MST uses DSU to
Sorting uses a standard sort algorithm.
Correct — if both endpoints have the same root (find returns same), adding the edge creates a cycle.
A sorted edge list or min-heap handles that.
DSU stores component memberships, not vertices directly.
Recap: Kruskal's: sort edges, for each edge (u,v) — if find(u) != find(v), add edge and union(u,v). Otherwise skip (would form cycle). Runs in O(E log E).