Notifications

No notifications

/Phase 4

CTEs & Recursive Queries

A Common Table Expression (WITH …) names a temporary result set you can reference later in the query β€” like a one-shot view. Add the RECURSIVE keyword and the same construct can walk trees, generate sequences, and traverse graphs.

On this page

Detailed Theory

The basic CTE

WITH active_users AS (
  SELECT id, email FROM users WHERE is_active
)
SELECT count(*) FROM active_users;

Read it as: "define active_users for the duration of this query, then use it like a table." Replaces deeply nested subqueries with a top-down narrative.

Multiple CTEs β€” chained reasoning

WITH
  recent_orders AS (
    SELECT * FROM orders WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
  ),
  per_user AS (
    SELECT user_id, COUNT(*) AS orders_30d, SUM(amount) AS spend_30d
    FROM   recent_orders
    GROUP  BY user_id
  )
SELECT u.name, p.orders_30d, p.spend_30d
FROM   per_user p
JOIN   users    u ON u.id = p.user_id
ORDER  BY p.spend_30d DESC
LIMIT  20;

Each CTE can reference earlier CTEs. The whole thing reads like a paragraph: define A, define B from A, use B at the end.

CTE vs subquery β€” when to pick which

CTEInline subquery
Reusable in the same query?βœ… multiple times❌ would need to repeat
Reads top-down?βœ…βŒ inside-out
Self-reference (recursion)?βœ…βŒ
Optimizer can inline?usually yes (PG 12+, others)always

Rule of thumb: reach for a CTE when the subquery is reused, recursive, or makes the query easier to read. For a single one-line subquery, leave it inline.

Writable CTEs (Postgres / SQL Server)

CTEs aren't just for SELECT β€” Postgres lets you put DML in them and chain results:

WITH archived AS (
  DELETE FROM sessions
  WHERE  expires_at < now()
  RETURNING id, user_id
)
INSERT INTO sessions_audit (id, user_id, archived_at)
SELECT id, user_id, now() FROM archived;

One statement, one transaction, no temp table. Killer feature for ETL.

Recursive CTEs

The shape is always:

WITH RECURSIVE name (cols...) AS (
  -- 1. anchor: the seed rows
  SELECT ...

UNION ALL -- (or UNION)

-- 2. recursive step: references "name" itself SELECT ... FROM name JOIN ... WHERE ... -- termination predicate! ) SELECT * FROM name;

The engine repeatedly executes the recursive step, feeding the previous iteration's output back in, until it produces zero new rows. Always include a termination condition.

Example 1 β€” generate a sequence

WITH RECURSIVE nums AS (
  SELECT 1 AS n
  UNION ALL
  SELECT n + 1 FROM nums WHERE n < 10
)
SELECT * FROM nums;
-- β†’ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

(In Postgres you'd more idiomatically use generate_series(1, 10), but the recursive form works in every engine that supports CTEs.)

Example 2 β€” walk an org chart (tree)

Schema: employees(id, name, manager_id).

WITH RECURSIVE org AS (
  -- anchor: the CEO (no manager)
  SELECT id, name, manager_id, 1 AS level, name::TEXT AS path
  FROM   employees
  WHERE  manager_id IS NULL

UNION ALL

-- step: each employee whose manager is already in 'org' SELECT e.id, e.name, e.manager_id, o.level + 1, o.path

' > '
e.name FROM employees e JOIN org o ON o.id = e.manager_id ) SELECT level, path FROM org ORDER BY path;

You get every employee plus their depth and reporting chain. Same shape works for category trees, comment threads, file systems, …

Example 3 β€” find all ancestors of a row

WITH RECURSIVE ancestors AS (
  SELECT * FROM categories WHERE id = 42
  UNION ALL
  SELECT c.*
  FROM   categories c
  JOIN   ancestors a ON c.id = a.parent_id
)
SELECT * FROM ancestors;

Walks upward by following parent_id until the root.

Example 4 β€” graph traversal with cycle protection

WITH RECURSIVE reachable AS (
  SELECT src, dst, ARRAY[src] AS visited
  FROM   edges
  WHERE  src = 1

UNION ALL

SELECT r.src, e.dst, r.visited || e.dst FROM edges e JOIN reachable r ON r.dst = e.src WHERE e.dst <> ALL(r.visited) -- skip already-visited nodes ) SELECT DISTINCT dst FROM reachable;

Without the visited array, a cycle would loop forever. Engines also offer a built-in CYCLE clause:

WITH RECURSIVE r AS (...)
CYCLE id SET is_cycle USING path     -- Postgres 14+, SQL Server, Oracle

UNION vs UNION ALL inside recursion

  • UNION ALL β€” fast, the standard choice; you handle dedup yourself if needed.
  • UNION β€” deduplicates each step, useful when you don't want repeated rows.
For graphs with cycles, prefer UNION ALL plus an explicit visited-list β€” UNION alone won't stop infinite loops because new visited arrays differ each iteration.

CTE pitfalls

1. Forgotten termination in a recursive CTE β†’ infinite loop. Most engines have a safety limit (max_recursion) but not all. 2. Old PG (≀ 11) didn't inline CTEs β€” they were optimization barriers. From PG 12 the planner inlines CTEs unless you write WITH … AS MATERIALIZED. 3. Reusing a CTE many times in a non-inlining engine recomputes it each time β€” materialize manually with AS MATERIALIZED or a temp table. 4. No indexes on a CTE result β€” it's a virtual table; if you join it many times to a big table, performance can suffer.

Quick recipe table

You want…Pattern
Name a subquery for claritynon-recursive CTE
Chain ETL steps in one statement (Postgres)writable CTE with RETURNING
Walk a tree downwardrecursive CTE, anchor = roots
Walk a tree upwardrecursive CTE, anchor = leaf, follow parent_id
Generate numbers / datesrecursive CTE or generate_series
Graph traversalrecursive CTE + visited array / CYCLE clause