Notifications

No notifications

Advanced SQL — JOINs, Subqueries, CTEs & Window Functions

Once you master basic queries, advanced SQL unlocks complex analytics: combining tables, writing reusable query blocks, and performing calculations across rows without collapsing groups.

JOIN Types — Combining Tables

JOIN TypeBehavior
INNER JOINOnly matching rows from both tables
LEFT JOINAll rows from left table + matches from right (NULL if no match)
RIGHT JOINAll rows from right table + matches from left
FULL OUTER JOINAll rows from both tables (NULLs where no match)
SELF JOINJoin a table to itself (e.g., employee → manager)

-- Find each employee's manager name (self-join)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

Subqueries — Queries Inside Queries

TypeUsageExample
ScalarReturns one valueWHERE salary > (SELECT AVG(salary) FROM employees)
RowReturns one rowWHERE (dept, salary) = (SELECT dept, MAX(salary) ...)
TableReturns a result setFROM (SELECT ... GROUP BY ...) AS sub
CorrelatedReferences outer queryWHERE salary > (SELECT AVG(salary) ... WHERE dept = e.dept)

CTEs — Common Table Expressions

CTEs (the WITH clause) create named temporary result sets for readability and reusability:

WITH dept_stats AS (
  SELECT department, AVG(salary) AS avg_sal
  FROM employees GROUP BY department
)
SELECT e.name, e.salary, d.avg_sal
FROM employees e
JOIN dept_stats d ON e.department = d.department
WHERE e.salary > d.avg_sal;

Window Functions — Analytics Without GROUP BY

Window functions compute values across a set of rows related to the current row without collapsing them:

SELECT name, department, salary,
  ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
  SUM(salary) OVER (PARTITION BY department) AS dept_total,
  LAG(salary) OVER (ORDER BY salary) AS prev_salary
FROM employees;

FunctionPurpose
ROW_NUMBER()Unique sequential rank
RANK()Rank with gaps on ties
DENSE_RANK()Rank without gaps on ties
LAG(col, n)Value from n rows before
LEAD(col, n)Value from n rows after
SUM() OVER()Running or partitioned total

On this page

Detailed Theory

Advanced SQL is where analysts stop fetching rows and start *computing* with them: combine tables with JOINs, structure queries with CTEs, do per-row calculations across groups with window functions, and make slow queries fast with indexes and EXPLAIN.

What "Advanced" Means in Practice

Four skills separate juniors from seniors:

1. Comfort joining 3+ tables without confusion. 2. Reaching for window functions instead of self-joins for per-group calculations. 3. Reading EXPLAIN plans and adding the right index. 4. Writing layered queries with CTEs that read top-to-bottom.

JOINs in One Table

A JOIN combines rows from two tables that share a key.

JOIN typeReturns
INNERonly matching rows in both tables
LEFTall rows from the left, NULLs where no match
RIGHTmirror of LEFT (rarely used)
FULL OUTEReverything from both, NULLs where no match
CROSSevery left row × every right row (Cartesian)

SELECT u.name, COUNT(o.id) AS orders
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;

The most common mistake: using INNER JOIN and silently losing users with zero orders. Use LEFT JOIN when one side might be missing.

Beginner Mistakes to Skip

1. COUNT(*) after a LEFT JOIN. It counts NULL rows too — use COUNT(o.id) to count only matched rows. 2. Joining on a non-indexed column. A 1M×1M join without indexes can run for minutes. 3. Putting the filter in WHERE after a LEFT JOIN. WHERE o.status = 'paid' removes the NULL rows the JOIN preserved — effectively turns it into INNER. Move the condition into the ON clause. 4. Repeating expensive subqueries. Promote them to a CTE so they run once and read clearly. 5. SELECT DISTINCT to fix duplicate rows. It's almost always covering a wrong JOIN. 6. Functions on indexed columns in WHERE. WHERE LOWER(email) = ... skips the index. Store data lower-cased, or add a functional index.

Intermediate: Subqueries vs CTEs

A subquery is a query inside another query. A CTE (Common Table Expression) is a named subquery that reads top-to-bottom.

-- Inline subquery
SELECT * FROM posts
WHERE views > (SELECT AVG(views) FROM posts);

-- CTE — same logic, easier to extend WITH avg_views AS ( SELECT AVG(views) AS v FROM posts ) SELECT p.* FROM posts p, avg_views WHERE p.views > avg_views.v;

Reach for CTEs once you have more than one subquery. They name the intermediate steps so the query reads like a paragraph.

Intermediate: Correlated Subqueries

A correlated subquery references the outer row — it runs *once per row*.

SELECT name, salary
FROM employees e
WHERE salary > (
  SELECT AVG(salary) FROM employees WHERE department = e.department
);

Readable, but slow on big tables. The same answer with a window function is usually 10× faster (next section).

Intermediate: EXISTS vs IN

-- IN  — builds the full list, then matches
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);

-- EXISTS — stops as soon as it finds a match per row SELECT * FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.user_id = u.id );

For large subqueries EXISTS is usually faster, and unlike IN it handles NULLs correctly.

Intermediate: UNION / UNION ALL

SELECT id, 'order' AS source FROM orders
UNION ALL
SELECT id, 'refund' FROM refunds;

UNION removes duplicates (extra sort cost). UNION ALL keeps duplicates and is faster — use it whenever you know there are no overlaps.

Advanced: Window Functions (the Game Changer)

Window functions compute across a set of rows without collapsing them. You keep every row and add a calculated column.

SELECT
  name,
  department,
  salary,
  RANK()           OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept,
  AVG(salary)      OVER (PARTITION BY department)                       AS dept_avg,
  salary - AVG(salary) OVER (PARTITION BY department)                   AS diff_from_avg,
  LAG(salary)      OVER (PARTITION BY department ORDER BY hired_at)     AS prev_hire_salary,
  SUM(salary)      OVER (ORDER BY hired_at)                             AS running_total
FROM employees;

Key functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE(4), LAG, LEAD, FIRST_VALUE, LAST_VALUE, SUM/AVG/COUNT OVER (...). Use cases: top-N per group, running totals, moving averages, deduplication, period-over-period.

Advanced: CTEs, Recursive CTEs & Pipelines

Chain CTEs to build a query in stages:

WITH paid AS (
  SELECT * FROM orders WHERE status = 'paid'
), monthly AS (
  SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
  FROM paid
  GROUP BY 1
)
SELECT month, revenue,
       LAG(revenue) OVER (ORDER BY month) AS prev_month,
       revenue - LAG(revenue) OVER (ORDER BY month) AS delta
FROM monthly
ORDER BY month;

Recursive CTEs walk hierarchies (org charts, comment threads, category trees). Powerful but rare — worth knowing exists.

Advanced: Indexes & Reading EXPLAIN

The single most valuable tuning skill.

CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;

What to look for in the plan:

  • Seq Scan on a big table → missing index.
  • Index Scan / Index Only Scan → good.
  • Rows Removed by Filter: 1,000,000 → wrong index, full scan happening anyway.
  • Sort with high actual time → add an index that already gives the order.
Compound index column order matters — the leftmost prefix is what gets used ((user_id, created_at) helps queries on user_id alone, but not created_at alone).

Advanced: Query Optimisation Cheat Sheet

1. Filter early — push WHERE before JOIN when possible. 2. Select only the columns you need — avoid SELECT * in production. 3. Index columns used in WHERE, JOIN, ORDER BY. 4. Avoid functions on indexed columns (LOWER(email), DATE(col)) — they break index use. 5. Prefer EXISTS over IN for big subqueries; UNION ALL over UNION. 6. For dashboards, materialise heavy aggregates into a materialised view that refreshes nightly.

Advanced: Materialised Views & Incremental Refresh

A materialised view stores the *result* of a query, not just the definition:

CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
FROM orders WHERE status = 'paid'
GROUP BY 1;

REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;

Great for dashboards that hit the same heavy query thousands of times.

Practice Path

1. Take users + orders + products and write a 3-table JOIN that returns each user's most-bought product. 2. Replace a correlated subquery with a window function and benchmark both with EXPLAIN ANALYZE. 3. Build a CTE chain that produces monthly revenue + month-over-month % change using LAG. 4. Add the right index to make a slow dashboard query flip from Seq Scan to Index Scan.