Hash Tables

Lesson 4 · OKSTEM College · AS Computer Science

How Hash Tables Work

A hash table maps keys to values in O(1) average time. The hash function converts a key to an integer index: index = hash(key) % table_size. Python’s dict is a hash table.

Operation	Average	Worst
Insert	O(1)	O(n)
Lookup	O(1)	O(n)
Delete	O(1)	O(n)

Worst case O(n) occurs when all keys hash to the same bucket (all collisions). A good hash function and low load factor prevent this.

Collision Resolution

Chaining (Python dict uses a variant)

Each bucket holds a list. On collision, append to the list. Lookup scans the bucket’s list.

Open Addressing (linear probing)

On collision, probe the next slot: (index + 1) % size. Lookup follows the same probe sequence. Deletion requires a tombstone marker.

Load factor α = n/m (items/buckets). Python dict resizes when α > 2/3, keeping average O(1) through amortised resizing.

Building a Simple Hash Map

class HashMap: def __init__(self, size=16): self.size = size self.buckets = [[] for _ in range(size)] def _idx(self, key): return hash(key) % self.size def put(self, key, val): bucket = self.buckets[self._idx(key)] for i,(k,v) in enumerate(bucket): if k==key: bucket[i]=(key,val); return bucket.append((key,val)) def get(self, key): for k,v in self.buckets[self._idx(key)]: if k==key: return v raise KeyError(key)

Practice

1. Use a hash map to find the first non-repeating character in a string.

from collections import Counter def first_unique(s): counts = Counter(s) # O(n) for ch in s: # O(n) if counts[ch] == 1: return ch return None # first_unique("aabbcd") -> 'c'

2. What makes a good hash function?

A good hash function: 1. Is deterministic (same input always gives same output). 2. Distributes keys uniformly to minimise clustering. 3. Is fast to compute (O(1) or O(key length)). 4. Has the avalanche effect — small key changes give very different hashes. Python's built-in hash() satisfies all of these for common types.

Knowledge Check

Average-case lookup in a hash table is

Correct — hash function computes the bucket directly; no search needed.

O(log n) is binary search in a sorted array/BST.

O(n) is worst case (all collisions), not average.

That's a sorting bound.

Recap: hash table average O(1) because: compute index in O(1), access bucket in O(1), scan short bucket in O(1) with low load factor.

Chaining resolves collisions by

That's open addressing / linear probing.

Correct — multiple keys can hash to the same bucket; they're stored in a chain (list).

Resizing is a response to load factor, not a collision resolution strategy.

That's double hashing, a type of open addressing.

Recap: chaining = each bucket is a list. Collision? Append to the list. Lookup? Hash to bucket, scan the list. O(1) avg when chains are short.

The load factor of a hash table is

That's table size m, not load factor.

Correct — α = n/m. High α means more collisions.

That's related but not the definition of load factor.

Load factor influences collision rate but is defined as n/m.

Recap: load factor α = n/m. Python dict resizes when α > 2/3, which keeps average chain length below 1.

A hash map is NOT appropriate when you need

Hash maps excel at this.

Correct — hash tables have no inherent order. Use a sorted dict/tree for ordered iteration.

Counter (a hash map subclass) is perfect for this.

Memoisation with a dict is a classic hash map use.

Recap: hash tables are unordered. For sorted keys use a BST or Python's sorted(dict). For ordered insertion use collections.OrderedDict or Python 3.7+ dict (preserves insertion order).

Python dict is guaranteed O(1) average because

Python dicts work with any hashable key.

Correct — amortised resizing keeps chains short, maintaining O(1) average.

Python dict uses open addressing with pseudo-random probing, not a sorted array.

There's no automatic lookup cache on dict.

Recap: dict resizes (doubles) when load factor exceeds ~2/3. Resizing is O(n) but happens rarely — amortised O(1) per operation.

← Previous Next →