Notifications

No notifications

/Phase 3

Indexes & EXPLAIN

An index is a side data-structure that lets the engine find rows without scanning the whole table. Get this right and a query goes from 30 s to 3 ms; get it wrong and writes crawl. EXPLAIN is how you read the engine's mind to know which is happening.

On this page

Detailed Theory

Why indexes?

Without an index, SELECT * FROM users WHERE email = 'a@x.com' reads every row (a sequential scan). With an index on email, the engine walks a tree in O(log n) and jumps straight to the row(s).

The trade-off: every index costs disk space and slows down writes (each INSERT/UPDATE/DELETE must update every relevant index). So index *what you query*, not everything.

B-tree — the default

CREATE INDEX idx_users_email ON users (email);

A B-tree is great for:

  • equality (=)
  • range (<, <=, >, >=, BETWEEN)
  • sorted output (ORDER BY indexed_col)
  • prefix LIKE 'abc%'
PRIMARY KEY and UNIQUE constraints automatically create B-tree indexes. You don't need to add them by hand.

Composite (multi-column) indexes

CREATE INDEX idx_users_country_age ON users (country, age);

Visualise a phone book: sorted first by country, then within each country by age. That index helps:

  • WHERE country = 'IN'
  • WHERE country = 'IN' AND age > 25
  • WHERE country = 'IN' ORDER BY age
  • WHERE age > 25 ❌ (skips leading column → scan)
Leftmost-prefix rule: an index on (a, b, c) helps queries that filter on a, or a+b, or a+b+c — never on b alone.

Column order matters: put the column with the most equality filters first.

Other index types

TypeUse caseEngines
Hashexact equality onlyPG (limited), MySQL Memory tables
GINfull-text, JSONB, arraysPostgres
GiSTgeospatial, range typesPostgres
BRINvery large append-only tables (logs)Postgres
Spatial / R-treegeometryMySQL, SQL Server
Full-texttext searchMySQL FULLTEXT, Postgres tsvector + GIN

For a Postgres JSONB column:

CREATE INDEX idx_users_prefs ON users USING GIN (preferences);
SELECT * FROM users WHERE preferences @> '{"theme":"dark"}';

Partial indexes — index only what matters

CREATE INDEX idx_orders_open
  ON orders (created_at)
  WHERE status = 'open';

If 95% of your queries are about open orders and 90% of rows are closed, a partial index is smaller, faster to update, and faster to scan.

Covering indexes — INCLUDE

Postgres / SQL Server:

CREATE INDEX idx_users_email_covering
  ON users (email)
  INCLUDE (name, country);

Now SELECT name, country FROM users WHERE email = 'a@x.com' is answered entirely from the index — no trip to the table. Called an "index-only scan".

Functional / expression indexes

Function-on-column normally kills index use:

WHERE LOWER(email) = 'ada@x.com'   -- doesn't use idx on email

Solution — index the expression itself:

CREATE INDEX idx_users_email_lower ON users (LOWER(email));

Now the same query uses the new index. Same trick for (extract(year FROM created_at)), etc.

When indexes hurt

  • Tiny tables (< a few thousand rows): full scan is faster than index lookup.
  • Low-cardinality columns where the value matches >10–20% of rows (e.g. boolean, gender) — full scan often wins; consider a partial index.
  • Heavy-write tables: every extra index slows every write.
  • Bloated indexes (Postgres) over time — periodic REINDEX or pg_repack may be needed.

EXPLAIN — read the plan

EXPLAIN SELECT * FROM users WHERE email = 'a@x.com';

Returns the plan: which scan type, in what order joins happen, expected row counts.

EXPLAIN ANALYZE (Postgres) actually runs the query and reports real timings:

Index Scan using idx_users_email on users
  (cost=0.29..8.31 rows=1 width=64)
  (actual time=0.040..0.041 rows=1 loops=1)
  Index Cond: (email = 'a@x.com'::text)
Planning Time: 0.123 ms
Execution Time: 0.080 ms

Look for:

  • Seq Scan on a big table → likely missing an index.
  • Bitmap / Index Scan → an index is being used.
  • Big gap between estimated and actual rows → stats are stale, run ANALYZE.
  • Sort + Limit at the top with huge rows scanned → consider an index that already provides the sort order.
MySQL equivalent: EXPLAIN and EXPLAIN ANALYZE (8.0+); SQL Server: SET SHOWPLAN_TEXT ON or "Display Estimated Plan".

Index hygiene checklist

  • ✅ Indexes for every FK column (user_id, order_id, …) — they're the join legs.
  • ✅ Composite index over multi-column WHERE/ORDER BY hot paths.
  • ✅ Partial / functional indexes for filtered or transformed predicates.
  • ✅ Drop unused indexes (Postgres: pg_stat_user_indexes).
  • ❌ Don't add an index per column "just in case".
  • ❌ Don't index tiny tables.
  • ❌ Don't put the function/cast on the column side of the comparison — rewrite or build an expression index.

Quick wins to remember

1. Add an index on every FK column. 2. EXPLAIN before adding an index, and again after. 3. Composite index leftmost-prefix rule decides column order. 4. Partial / covering / functional indexes solve most "almost works" cases. 5. Each index slows writes — measure both sides.