Notifications

No notifications

/Phase 5

Advanced Structures

Advanced Data Structures

These structures solve specific problems much more efficiently than basic ones.

Disjoint Set Union (Union-Find)

Tracks elements partitioned into disjoint sets. Operations:
  • Find(x): Which set does x belong to?
  • Union(x, y): Merge sets containing x and y
Optimizations:
  • Path Compression: Make nodes point directly to root
  • Union by Rank: Attach smaller tree under larger
Time: Nearly O(1) per operation (amortized)

Segment Tree

A tree that allows efficient range queries and point updates.
  • Build: O(n)
  • Query (range sum/min/max): O(log n)
  • Update: O(log n)

Binary Indexed Tree (Fenwick Tree)

Simpler alternative to segment tree for:
  • Prefix sum queries: O(log n)
  • Point updates: O(log n)

Applications

  • Union-Find: Connected components, Kruskal's MST, cycle detection
  • Segment Tree: Range minimum query, count of inversions
  • Fenwick Tree: Running frequency, prefix sums

On this page

Detailed Theory

Once the basics (arrays, lists, hash tables, BSTs) hit their limits, specialised structures take over: segment trees for range queries, Union-Find for dynamic connectivity, LRU caches for hot-data eviction, Bloom filters for "probably-yes/definitely-no" membership, and skip lists for ordered storage with simple code. These are the data structures that show up in databases, operating systems, browsers, Redis, and almost every "design X" interview question.

What "Advanced" Means Here

These structures fix specific weaknesses of the basic ones:

Basic structureLimitationAdvanced fix
arrayO(n) range sumFenwick / segment tree
arrayO(n) range min/maxsegment tree
BSTdegenerates to O(n)AVL / Red-Black
hash tableno order, no rangeSkip list / sorted map
sethuge memory for "is x in?"Bloom filter
mapunbounded growthLRU cache
stringO(n) insert in middleRope

Think of each as the right tool for one specific job — not general-purpose like the basics.

Daily Mental Model: A Swiss Army Knife

You don't carry a segment tree around for fun. You reach for it when you have range queries + updates on millions of elements. You reach for Union-Find when "are these two things in the same group, dynamically?" comes up. The skill is pattern recognition — reading a problem and noticing "oh, that's an LRU pattern" or "that's a range-sum pattern".

Beginner Mistakes to Skip

1. Building a segment tree for read-only queries. A simple prefix-sum array does range-sum queries in O(1) if there are no updates. Don't reach for the heavy tool. 2. Using Union-Find without path compression *and* union by rank. Either alone gives O(log n); both together give O(α(n)) ≈ O(1). Skipping one is the most common Union-Find mistake. 3. Sizing the segment tree array as 2n. It must be 4n to be safe for any tree shape. Using 2n works only when n is a power of 2. 4. Treating a Bloom filter as an exact set. It can return *false positives* — "yes, x is in" sometimes lies. It never returns false negatives. Use only when you can tolerate the false-positive rate (or back it with the real source). 5. Building an LRU cache without a doubly linked list. A singly linked list makes "remove this specific node" O(n), defeating the whole point. Hash map → DLL node is the canonical layout. 6. Forgetting lazy propagation on segment trees with range updates. Without it, range update is O(n log n). With lazy propagation, it's O(log n).

Intermediate: Union-Find — the O(α(n)) Workhorse

`js class UnionFind { constructor(n) { this.parent = Array.from({length: n}, (_, i) => i); this.rank = new Array(n).fill(0); } find(x) { // path compression if (this.parent[x] !== x) { this.parent[x] = this.find(this.parent[x]); } return this.parent[x]; } union(x, y) { // union by rank const rx = this.find(x), ry = this.find(y); if (rx === ry) return false; if (this.rank[rx] < this.rank[ry]) this.parent[rx] = ry; else if (this.rank[rx] > this.rank[ry]) this.parent[ry] = rx; else { this.parent[ry] = rx; this.rank[rx]++; } return true; } connected(x, y) { return this.find(x) === this.find(y); } } `

α(n) ≤ 4 for any n ≤ 10⁸⁰. Effectively constant time. Powers Kruskal's MST, dynamic connectivity, "Number of Islands II", "Accounts Merge", "Friend Circles", percolation simulations.

Intermediate: LRU Cache — Hash Map + Doubly Linked List

The classic O(1) get/put cache. Each node has \prev\ and \next\ pointers; the hash map maps keys to nodes.

` get(key) → look up node in map → unlink → relink at head → return value. put(key, value) if exists: update + move to head. else: create new node at head; if size > capacity, evict tail. `

Why a doubly linked list? You need O(1) removal of an arbitrary middle node (given its pointer from the map) AND O(1) insertion at the head. A singly linked list makes the removal O(n) because you'd have to scan to find the node before it.

Used in: Linux page cache, Redis (\maxmemory-policy allkeys-lru\), browser cache, Memcached.

Intermediate: Segment Tree — Range Queries + Point Updates

`js class SegTree { constructor(arr) { this.n = arr.length; this.tree = new Array(4 * this.n).fill(0); this.build(arr, 1, 0, this.n - 1); } build(arr, node, l, r) { if (l === r) { this.tree[node] = arr[l]; return; } const m = (l + r) >> 1; this.build(arr, 2*node, l, m); this.build(arr, 2*node + 1, m + 1, r); this.tree[node] = this.tree[2*node] + this.tree[2*node + 1]; } query(ql, qr, node = 1, l = 0, r = this.n - 1) { if (qr < l || r < ql) return 0; if (ql <= l && r <= qr) return this.tree[node]; const m = (l + r) >> 1; return this.query(ql, qr, 2*node, l, m) + this.query(ql, qr, 2*node + 1, m + 1, r); } update(i, val, node = 1, l = 0, r = this.n - 1) { if (l === r) { this.tree[node] = val; return; } const m = (l + r) >> 1; if (i <= m) this.update(i, val, 2*node, l, m); else this.update(i, val, 2*node + 1, m + 1, r); this.tree[node] = this.tree[2*node] + this.tree[2*node + 1]; } } `

OperationComplexity
BuildO(n)
Point updateO(log n)
Range queryO(log n)
Range update + lazyO(log n)

Replace \+\ with \min\, \max\, \gcd\, XOR, etc. — anything associative works.

Advanced: Lazy Propagation — Range Updates in O(log n)

Naive range update touches every leaf — O(n). Lazy propagation stores pending operations on internal nodes and pushes them down only when needed.

` update(l, r, val): if completely inside: apply to node, mark lazy, return. push(node) // propagate any pending updates to children recurse on children combine(node) = combine(left, right) `

Now range updates are O(log n) too. This is what makes segment trees competitive-programming gold.

Advanced: Fenwick Tree (BIT) — When Segment Tree Is Overkill

Same prefix-sum + point-update problem, much simpler code, smaller constant factor.

`js class BIT { constructor(n) { this.n = n; this.t = new Array(n + 1).fill(0); } update(i, delta) { // i is 1-indexed for (; i <= this.n; i += i & -i) this.t[i] += delta; } prefix(i) { let s = 0; for (; i > 0; i -= i & -i) s += this.t[i]; return s; } range(l, r) { return this.prefix(r) - this.prefix(l - 1); } } `

The magic is \i & -i\ — extracts the lowest set bit, telling you the parent (going up) or ancestor (going down).

FeatureFenwick (BIT)Segment Tree
Code length~10 lines~40 lines
Spacen + 14n
Range updatetricky (needs two BITs)natural with lazy
Operationsinvertible only (sum, XOR)any associative (min, max, gcd)
Constant factorsmallerlarger

Rule of thumb: prefix sums + point updates → BIT. Range min/max or range updates → segment tree.

Advanced: Bloom Filter — Probabilistic Membership

` init: bit array of size m, k hash functions. insert(x): set bits h_1(x), …, h_k(x) to 1. contains(x): if any of h_1(x)..h_k(x) is 0 → DEFINITELY not in. if all are 1 → PROBABLY in (false-positive possible). `

False-positive rate ≈ \(1 − e^(−kn/m))^k\. Optimal k for given m, n: \k ≈ (m/n) ln 2\. With ~10 bits per element you get ~1% false positives.

Real uses:

  • Chrome / Firefox — checking URL against the malicious-URL set before doing the real (slow) lookup.
  • Cassandra / HBase / LevelDB — quickly test "could this row be in this SSTable?" before reading from disk.
  • Bitcoin SPV nodes — privacy-preserving balance queries.
  • CDNs — "do we have this asset cached?"
Cannot delete from a standard Bloom filter (clearing bits could affect other elements). Use Counting Bloom Filter (counters instead of bits) when deletion matters.

Advanced: Skip List — Probabilistic Ordered Set

Multiple linked lists at different levels of granularity. Each element is randomly promoted to higher levels with p = 1/2.

` Level 3: ──→ A ───────────→ E ───→ end Level 2: ──→ A → B ───────→ E ───→ end Level 1: ──→ A → B → C → D → E ───→ end Level 0: ──→ A → B → C → D → E → F (all elements) `

Search starts at the top level and drops down — average O(log n) per operation, worst case O(n) (rare). Simpler than balanced BSTs and famously easy to make lock-free for concurrent access. Used in Redis sorted sets (ZADD/ZRANGE) and LevelDB / RocksDB memtables.

Advanced: AVL vs Red-Black — How They Differ

PropertyAVLRed-Black
Balancestrict (height differ ≤ 1)relaxed (no path > 2× any other)
Lookupslightly fasterslightly slower
Insert/deletemore rotationsfewer rotations
Used inDBs, file systemsC++ \std::map\, Java \TreeMap\, Linux scheduler (CFS uses RB tree)

Rule of thumb: read-heavy → AVL. Write-heavy → Red-Black. Both are O(log n) for everything.

Advanced: Rope — A Tree of String Fragments

Standard strings are O(n) to insert/delete in the middle. A rope stores text as a balanced binary tree of small fragments. Each internal node caches the total length of its left subtree.

OperationStringRope
concatO(n)O(log n)
insert at iO(n)O(log n)
delete rangeO(n)O(log n)
index accessO(1)O(log n)

Used by: VS Code's text buffer (a piece-tree variant), Emacs gap buffer evolved variants, IntelliJ IDEA, collaborative editors (Etherpad, Google Docs CRDTs build on similar ideas).

Advanced: Comparison + When-to-Use Cheatsheet

StructureOperationTimeReach for it when…
Union-Findunion/findO(α(n))dynamic connectivity, MST, "are these in the same group?"
Segment treerange query + updateO(log n)range min/max/sum WITH updates
Fenwick (BIT)prefix sum + point updateO(log n)simpler range sums, no range updates
LRU cacheget/putO(1)bounded cache with recency eviction
Bloom filtercontainsO(k)huge set + false-positives ok
Skip listsearch/insert/deleteO(log n) avgsorted set with simple, concurrent code
AVL / Red-Blacksearch/insert/deleteO(log n)sorted map (use language standard library)
Ropeconcat/insertO(log n)giant editable text

Advanced: Interview Frequency

StructureFrequencyTypical question
LRU Cachevery highLeetCode 146 — implement
Union-Findvery highNumber of Islands II, Accounts Merge, Redundant Connection
Segment treemediumRange Sum Query - Mutable, Count of Smaller Numbers After Self
Fenwick treemediumCount of Smaller Numbers After Self (alternative)
Bloom filtersystem design"Design URL shortener / spam filter"
Skip listsystem design / rare coding"Design Redis sorted set"
Roperare"Design a text editor"
AVL / RBconceptualusually use language built-in

Practice Path

1. Implement Union-Find with path compression + union by rank. Use it to count islands after a stream of "add land at (r,c)" operations (Number of Islands II). 2. Implement LRU Cache with O(1) get/put using a hash map + doubly linked list. Verify the eviction order on a sequence of operations. 3. Implement a segment tree for range sum with build, point-update, and range-query. Compare correctness against a naive prefix-sum baseline. 4. Implement a small Bloom filter with 3 hash functions on a 1024-bit array. Insert 100 strings, check membership for 200 (100 inserted + 100 random), and measure the false-positive rate.