Last 30 Days
No notifications
A hash table maps keys to values using a hash function, providing O(1) average lookup, insertion, and deletion.
| Method | How | Pros | Cons | ||||
| Chaining | Linked list at each bucket | Simple, no limit | Extra memory | ||||
| Open Addressing | Probe for next empty slot | Cache-friendly | Clustering | ||||
| Linear Probing | Check next slot | Simple | Primary clustering | ||||
| Quadratic Probing | Check i² away | Less clustering | Secondary clustering | Time Complexity | Operation | Average | Worst |
| Insert | O(1) | O(n) | |||||
| Search | O(1) | O(n) | |||||
| Delete | O(1) | O(n) |
Hash tables are the single most useful data structure in everyday programming. Every \Map\, \dict\, \HashMap\, \unordered_map\, JS object property, Redis key, environment variable lookup — they're all hash tables under the hood. The whole idea: turn a key into an array index in O(1), so lookup is *also* O(1). When you internalise this, half of "how do I make this faster?" answers become "use a hash map".
A hash table is an array of buckets + a hash function that turns any key into an index into that array.
`
key "name" ──hash──▶ 417 ──% 16──▶ bucket 1 ──▶ ["name", "Ana"]
`
Three steps every time:
1. Hash the key to a (large) integer.
2. Compress that integer to a valid index: \hash % tableSize\.
3. Store (or read) at \array[index]\.
Because step 3 is a constant-time array access, the *whole* operation is O(1) — *as long as* most keys land in different buckets.
Think of a hash table as a coat-check stand at a theatre:
1. Using a non-hashable / mutable key. In Python, a list can't be a dict key. In JS, objects-as-keys with a plain \Object\ use \toString()\ (so \{a:1}\ and \{a:2}\ both become \"[object Object]"\). Use \Map\ for object keys.
2. Mutating an object after using it as a key. The hash now no longer matches where it was stored — the entry becomes "lost".
3. Iteration order assumptions. Order varies by language: Python 3.7+ dicts and JS \Map\ preserve insertion order; older Java HashMap doesn't; Go *deliberately randomises* it.
4. Counting on O(1). Hash tables are O(1) on average. Worst case is O(n) when everything collides. For adversarial inputs, this matters (HashDoS attacks).
5. \if (map[key])\ to check existence. Falsy values (0, "", false, null) make this break. Use \map.has(key)\ or \key in map\.
6. Resizing during a hot loop. First insert into a tiny map → many rehashes. If you know the size, pre-size: \new Map(entries)\, \dict.fromkeys()\, \new HashMap<>(expectedSize)\.
A "good" hash function is:
| Function | Formula / Idea | Used for |
| Division | \h(k) = k % m\ (m prime) | quick integer hash |
| Multiplication | \floor(m * (k * A % 1))\, A ≈ 0.618 | integer hash, less sensitive to m |
| Polynomial rolling | \Σ s[i] * pⁱ % m\ | string hash (Rabin-Karp) |
| FNV-1a | XOR each byte then × prime | general purpose, simple |
| MurmurHash / xxHash | bit mixing | high-performance lookups (Java, Redis) |
| SipHash | keyed, cryptographically randomised | defends against HashDoS (Python, Rust, Ruby use this) |
Why prime table sizes? \% prime\ washes out periodic patterns in keys. Java uses powers of two and re-hashes the key with bit-mixing instead.
Two keys hash to the same bucket. You have to handle it.
Chaining (separate chaining): each bucket is a linked list (or, in modern Java, a red-black tree once the chain exceeds 8). Insert appends; lookup walks the chain.
`
bucket 4 ─▶ ["dog",1] ─▶ ["god",2] ─▶ ["odg",3]
`
| Probe | Sequence | Pros | Cons |
| Linear | i, i+1, i+2, … | cache-friendly | primary clustering |
| Quadratic | i, i+1², i+2², … | breaks runs | secondary clustering |
| Double hashing | i + j * h₂(k) | nearly uniform | two hashes per probe |
dict\ and Rust's \HashMap\ use open addressing. Java's \HashMap\ uses chaining (with treeification).\α = n / m\ (entries / buckets). It controls collision frequency.
| α | Behaviour |
| 0.25 | very fast, lots of empty space |
| 0.5–0.75 | sweet spot — typical default |
| > 0.75 | collisions climb fast |
| ≥ 1 (open addressing) | impossible — must resize first |
When α crosses the threshold, resize:
1. Allocate a new array (typically 2× the size — Java HashMap, JS Map).
2. Rehash every entry into the new array (indices change because \m\ changed).
3. Replace the old array.
A single resize is O(n), but it happens rarely enough that amortised cost per insert stays O(1).
Caveat: during resize, the entire structure can be unavailable — that's why high-throughput systems pre-size or use *incremental* rehashing (Redis splits the rehash across many ops).
A hash set is just a hash map with no value (or a dummy). Use it for membership checks and deduplication.
When not to use a hash table:
TreeMap\, \std::map\, \SortedDict\).map[x] = (map[x] || 0) + 1\. Most-frequent element, anagrams, top-K, majority element.x\, ask if \target - x\ is in the map. O(n) instead of O(n²).| Language | Map | Set | Notes |
| Java | \HashMap | \HashSet | chaining + treeify ≥ 8, default α 0.75 |
| Python | \dict\ | \set\ | open addressing with perturbation, insertion-ordered since 3.7 |
| JavaScript | \Map\ / object | \Set\ | \Map\ allows any key type, insertion-ordered, has \.size\ |
| C++ | \unordered_map\ | \unordered_set\ | chaining; \.reserve(n)\ to pre-size |
| Rust | \HashMap\ | \HashSet\ | SipHash by default (DoS-resistant) |
| Go | \map[K]V\ | use \map[K]struct{}\ | iteration order is deliberately randomised |
JS-specific gotcha: \Map\ ≠ plain object. \{}\ keys are coerced to strings, has prototype keys (\__proto__\, \hasOwnProperty\) you can accidentally collide with. Use \new Map()\ for general data.
Naive sharding: \server = hash(key) % numServers\. Add or remove one server → \numServers\ changes → *every* key remaps. Cache invalidates entirely.
Consistent hashing: arrange servers on a ring at positions \hash(serverId)\. A key lives on the next server clockwise from \hash(key)\. Adding/removing a server only remaps ~1/n of keys.
Virtual nodes: place each server at *many* random ring positions to even out load. This is what Memcached, Cassandra, DynamoDB, and Redis Cluster all do.
If an attacker can predict your hash function, they can craft inputs that all collide → all of your "O(1)" operations become O(n) → server falls over.
Defences:
1. Implement a tiny hash map with chaining: array of 16 buckets, polynomial hash for strings, \set\ / \get\ / \delete\. Then add resizing when α > 0.75.
2. Solve Two Sum in O(n) using a hash map. Then solve Subarray Sum Equals K with prefix-sum + hash map.
3. Implement an LRU Cache with a hash map + doubly linked list. Verify both \get\ and \put\ are O(1).
4. Build the Group Anagrams solution with a hash map keyed on the sorted version of each string. Then re-do it with a 26-element character-count tuple as the key — confirm the same answer in O(n · m) instead of O(n · m log m).