Notifications

No notifications

/Phase 4

Binary Search Trees

Binary Search Trees (BST)

A BST is a binary tree where for every node: left < node < right.

BST Property

  • All nodes in the left subtree have values less than the node
  • All nodes in the right subtree have values greater than the node
  • This property holds for every node in the tree

Operations

OperationAverageWorst (Skewed)
SearchO(log n)O(n)
InsertO(log n)O(n)
DeleteO(log n)O(n)
Min/MaxO(log n)O(n)

Deletion Cases

1. Leaf node: Simply remove 2. One child: Replace with child 3. Two children: Replace with inorder successor (smallest in right subtree) or inorder predecessor

Balanced BSTs

  • AVL Tree: Heights differ by at most 1, uses rotations
  • Red-Black Tree: Used in most standard libraries
  • Guaranteed O(log n) operations

On this page

Detailed Theory

A Binary Search Tree is what you get when you combine "binary tree" with "binary search" — a data structure that stays *searchable in O(log n)* but, unlike a sorted array, also lets you insert and delete in O(log n). Every dynamic ordered collection in your standard library — Java's \TreeMap\, C++'s \std::map\, Python's \SortedDict\, the Linux kernel scheduler — is a BST variant under the hood.

The BST Property — Precisely Defined

For every node \n\:

  • Every value in the left subtree is strictly less than \n.value\.
  • Every value in the right subtree is strictly greater than \n.value\.
It's not "left child < node < right child" — that's the most common beginner trap. The rule must hold for the entire subtree, recursively.

` 8 ✓ Valid: every {1,3,4,6,7} < 8 and every {10,14} > 8 / \ 3 10 / \ \ 1 6 14 / \ 4 7 `

Daily Mental Model: "Sorted Array You Can Mutate"

A sorted array gives you O(log n) lookup but O(n) insert/delete. A linked list gives you O(1) insert at known position but O(n) lookup. A BST gives you O(log n) for all three — the best of both — by spending O(log n) walks down the tree to maintain order.

Inorder traversal of any BST always gives sorted output. That's the single most useful invariant: when you're stuck, ask "what does inorder give me here?"

Beginner Mistakes to Skip

1. Validating with parent comparisons only. \node.left.val < node.val && node.right.val > node.val\ is wrong — the property must hold *for the entire subtree*. Use min/max bounds. 2. Forgetting balancing exists. Insert sorted data into a plain BST → degenerate skewed "linked list" → all operations become O(n). A 100k-node skewed tree blows the recursion stack and runs in seconds instead of microseconds. 3. Botching delete. Three cases — leaf, one child, two children — and the two-child case needs the inorder successor. People mix up "swap value, delete successor" with "rebuild whole subtree". 4. Using equality with a tolerance. BSTs assume keys have a strict comparator. Float keys with NaN or epsilon comparisons silently break the property. 5. Confusing "balanced" with "perfect". Balanced means heights differ by ≤ 1 at every node. Perfect means every level full. AVL keeps balanced, not perfect.

Intermediate: The Three Operations Step by Step

Search — O(h)

`js function search(node, key) { while (node) { if (key === node.val) return node; node = key < node.val ? node.left : node.right; } return null; } `

Each comparison eliminates ~half the remaining tree (in a balanced BST).

Insert — O(h): search until you hit a \null\ slot, hang the new node there. The structure is preserved automatically.

Delete — O(h), the case that interviewers love:

CaseConditionAction
1LeafJust unlink it
2One childReplace the node with that child
3Two childrenFind inorder successor (smallest in right subtree), copy its value into this node, then delete the successor (which has at most one child — case 1 or 2)

Why the inorder successor? It's the smallest value strictly larger than the deleted node — replacing with it preserves "left < node < right" by definition.

Intermediate: Inorder Gives You Sorted Order

This is the single most useful BST identity:

` inorder(8) → 1, 3, 4, 6, 7, 8, 10, 14 `

Consequences:

  • "K-th smallest" = inorder traversal stopped at the k-th visit. O(h + k).
  • "Validate BST" = inorder traversal, check each value > the previous one.
  • "BST iterator with O(1) amortised \next\" = explicit stack simulating inorder.
  • "Recover BST with two swapped nodes" = inorder, find the two out-of-order pairs.

Intermediate: Floor, Ceiling, Successor, Predecessor

These show up *constantly* in real systems (e.g. "what's the next available time slot?").

Floor(x) — largest key ≤ x:

`js function floor(node, x, best = null) { while (node) { if (node.val === x) return node.val; if (x < node.val) node = node.left; else { best = node.val; node = node.right; } } return best; } `

Ceiling(x) — symmetric.

Inorder successor of a given node:

  • If it has a right subtree → leftmost node of right subtree.
  • Otherwise → walk up to the first ancestor where the node lies in its left subtree.
Both are O(h).

Intermediate: Range Queries — What Hash Tables Can't Do

"Give me all keys in [lo, hi]":

`js function rangeQuery(node, lo, hi, out = []) { if (!node) return out; if (lo < node.val) rangeQuery(node.left, lo, hi, out); if (lo <= node.val && node.val <= hi) out.push(node.val); if (node.val < hi) rangeQuery(node.right, lo, hi, out); return out; } `

O(h + k) where k is the number of results. A hash table can't do this efficiently — that's the single biggest reason to pick a BST over a hash map.

Advanced: Why Balancing Matters (and AVL Trees)

Insert 1,2,3,4,5 into a plain BST → straight line. Height = n. All operations O(n). Catastrophic.

AVL trees (Adelson-Velsky and Landis, 1962) — the first self-balancing BST.

  • Balance factor of a node = \height(left) − height(right)\.
  • Invariant: \
    BF
    ≤ 1\
    at every node.
  • After insert/delete, walk back up; whenever
    BF
    = 2, rotate.
The four cases — and the rotation that fixes each:

PatternImbalanceRotation
LLinserted into left-of-leftsingle right rotation
RRinserted into right-of-rightsingle left rotation
LRinserted into right-of-leftleft-then-right (double)
RLinserted into left-of-rightright-then-left (double)

Single right rotation (the LL fix):

` z y / \ / \ y T4 → x z / \ / \ / \ x T3 T1 T2 T3 T4 / \ T1 T2 `

AVL gives strict O(log n) and the height is at most ~1.44·log₂ n.

Advanced: Red-Black Trees (the Library Default)

Used by Java \TreeMap\, C++ \std::map\ / \std::set\, the Linux kernel's CFS scheduler.

The five rules (must all hold):

1. Every node is red or black. 2. The root is black. 3. Every NIL leaf is black. 4. A red node's children are both black (no two consecutive reds). 5. Every root-to-NIL path has the same number of black nodes (equal black-height).

Together they cap the height at \2·log₂(n+1)\. Slightly looser balance than AVL means fewer rotations on insert/delete — that's why standard libraries pick RB over AVL.

AVLRed-Black
Height bound1.44 log₂ n2 log₂(n+1)
Rotations on insertup to 2up to 2
Rotations on deleteup to log nup to 3
Searchslightly fasterslightly slower
Best forread-heavywrite-heavy

Advanced: B-Trees and B+ Trees — How Databases Actually Do It

In-memory pointer chasing is cheap. Disk seeks are not. A binary tree of 1B keys has height ~30 → 30 disk seeks → milliseconds *each*. Database indexes can't afford that.

B-tree of order m: every internal node has up to m children and m−1 keys. Height becomes \log_m n\ instead of \log₂ n\ — a B-tree of order 100 over 1B keys has height ~5.

B+ tree (the variant databases actually use):

  • All data lives in leaf nodes.
  • Internal nodes contain only routing keys.
  • Leaves are linked in a doubly linked list → range scans become a leaf-traversal, not a tree-traversal.
Used in MySQL InnoDB, PostgreSQL, MongoDB WiredTiger, NTFS, ext4, HFS+. If you've ever run \SELECT … WHERE id BETWEEN 100 AND 200\, you've walked a B+ tree.

Advanced: Augmented BSTs

You can attach extra metadata at each node, recomputed on insert/delete, to unlock new queries:

  • Order-statistic tree: each node stores \size = 1 + size(left) + size(right)\. Now \select(k)\ (k-th smallest) and \rank(x)\ are both O(log n) — instead of O(k) for inorder.
  • Interval tree: each node stores an interval and the max endpoint in its subtree. Used to answer "find all intervals overlapping with [a, b]" in O(log n + k).
  • Segment tree (cousin, not strictly a BST): supports range-sum / range-min queries with point updates in O(log n).

Advanced: The Interview Pattern Cheat Sheet

  • Validate BST → recursion with (min, max) bounds, *not* parent comparison.
  • K-th smallest → inorder, stop at k. Or augmented BST in O(log n).
  • LCA in BST → if both keys < node go left; both > go right; else this node is the LCA. O(h), no full DFS needed.
  • Sorted array → balanced BST → middle element as root, recurse on halves. O(n).
  • BST iterator → explicit stack of "current and all left ancestors", O(1) amortised \next\, O(h) memory.
  • Recover BST → two swapped nodes break inorder monotonicity exactly twice; record the first and second offenders, swap their values.

Practice Path

1. Implement \insert\ and \search\ recursively. Then add \delete\ and handle all three cases — verify on a tree with the two-child case at the root. 2. Write \isValidBST\ using min/max bounds. Stress-test it on a tree where every immediate parent-child pair is fine but a deeper descendant violates the property. 3. Implement \kthSmallest\ two ways: full inorder vs early-stop iterative inorder. Confirm the early-stop is O(h + k). 4. Implement \rangeQuery(lo, hi)\ with the prune-the-recursion trick from the intermediate section. Compare it to scanning a flat array.