A hash table maps keys to values in O(1) average time. The hash function converts a key to an integer index: index = hash(key) % table_size. Python’s dict is a hash table.
Operation
Average
Worst
Insert
O(1)
O(n)
Lookup
O(1)
O(n)
Delete
O(1)
O(n)
Worst case O(n) occurs when all keys hash to the same bucket (all collisions). A good hash function and low load factor prevent this.
Collision Resolution
Chaining (Python dict uses a variant)
Each bucket holds a list. On collision, append to the list. Lookup scans the bucket’s list.
Open Addressing (linear probing)
On collision, probe the next slot: (index + 1) % size. Lookup follows the same probe sequence. Deletion requires a tombstone marker.
Load factor α = n/m (items/buckets). Python dict resizes when α > 2/3, keeping average O(1) through amortised resizing.
Building a Simple Hash Map
classHashMap:
def__init__(self, size=16):
self.size = size
self.buckets = [[] for _ inrange(size)]
def_idx(self, key): returnhash(key) % self.size
defput(self, key, val):
bucket = self.buckets[self._idx(key)]
for i,(k,v) inenumerate(bucket):
if k==key: bucket[i]=(key,val); return
bucket.append((key,val))
defget(self, key):
for k,v inself.buckets[self._idx(key)]:
if k==key: return v
raiseKeyError(key)
Practice
1. Use a hash map to find the first non-repeating character in a string.
from collections import Counter
def first_unique(s):
counts = Counter(s) # O(n)
for ch in s: # O(n)
if counts[ch] == 1:
return ch
return None
# first_unique("aabbcd") -> 'c'
2. What makes a good hash function?
A good hash function:
1. Is deterministic (same input always gives same output).
2. Distributes keys uniformly to minimise clustering.
3. Is fast to compute (O(1) or O(key length)).
4. Has the avalanche effect — small key changes give very different hashes.
Python's built-in hash() satisfies all of these for common types.
Knowledge Check
Average-case lookup in a hash table is
Correct — hash function computes the bucket directly; no search needed.
O(log n) is binary search in a sorted array/BST.
O(n) is worst case (all collisions), not average.
That's a sorting bound.
Recap: hash table average O(1) because: compute index in O(1), access bucket in O(1), scan short bucket in O(1) with low load factor.
Chaining resolves collisions by
That's open addressing / linear probing.
Correct — multiple keys can hash to the same bucket; they're stored in a chain (list).
Resizing is a response to load factor, not a collision resolution strategy.
That's double hashing, a type of open addressing.
Recap: chaining = each bucket is a list. Collision? Append to the list. Lookup? Hash to bucket, scan the list. O(1) avg when chains are short.
The load factor of a hash table is
That's table size m, not load factor.
Correct — α = n/m. High α means more collisions.
That's related but not the definition of load factor.
Load factor influences collision rate but is defined as n/m.
Recap: load factor α = n/m. Python dict resizes when α > 2/3, which keeps average chain length below 1.
A hash map is NOT appropriate when you need
Hash maps excel at this.
Correct — hash tables have no inherent order. Use a sorted dict/tree for ordered iteration.
Counter (a hash map subclass) is perfect for this.
Memoisation with a dict is a classic hash map use.
Recap: hash tables are unordered. For sorted keys use a BST or Python's sorted(dict). For ordered insertion use collections.OrderedDict or Python 3.7+ dict (preserves insertion order).