Last 30 Days
No notifications
Once you master basic queries, advanced SQL unlocks complex analytics: combining tables, writing reusable query blocks, and performing calculations across rows without collapsing groups.
| JOIN Type | Behavior |
INNER JOIN | Only matching rows from both tables |
LEFT JOIN | All rows from left table + matches from right (NULL if no match) |
RIGHT JOIN | All rows from right table + matches from left |
FULL OUTER JOIN | All rows from both tables (NULLs where no match) |
SELF JOIN | Join a table to itself (e.g., employee → manager) |
-- Find each employee's manager name (self-join)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;| Type | Usage | Example |
| Scalar | Returns one value | WHERE salary > (SELECT AVG(salary) FROM employees) |
| Row | Returns one row | WHERE (dept, salary) = (SELECT dept, MAX(salary) ...) |
| Table | Returns a result set | FROM (SELECT ... GROUP BY ...) AS sub |
| Correlated | References outer query | WHERE salary > (SELECT AVG(salary) ... WHERE dept = e.dept) |
CTEs (the WITH clause) create named temporary result sets for readability and reusability:
WITH dept_stats AS (
SELECT department, AVG(salary) AS avg_sal
FROM employees GROUP BY department
)
SELECT e.name, e.salary, d.avg_sal
FROM employees e
JOIN dept_stats d ON e.department = d.department
WHERE e.salary > d.avg_sal;Window functions compute values across a set of rows related to the current row without collapsing them:
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
SUM(salary) OVER (PARTITION BY department) AS dept_total,
LAG(salary) OVER (ORDER BY salary) AS prev_salary
FROM employees;| Function | Purpose |
ROW_NUMBER() | Unique sequential rank |
RANK() | Rank with gaps on ties |
DENSE_RANK() | Rank without gaps on ties |
LAG(col, n) | Value from n rows before |
LEAD(col, n) | Value from n rows after |
SUM() OVER() | Running or partitioned total |
Advanced SQL is where analysts stop fetching rows and start *computing* with them: combine tables with JOINs, structure queries with CTEs, do per-row calculations across groups with window functions, and make slow queries fast with indexes and EXPLAIN.
Four skills separate juniors from seniors:
1. Comfort joining 3+ tables without confusion. 2. Reaching for window functions instead of self-joins for per-group calculations. 3. Reading EXPLAIN plans and adding the right index. 4. Writing layered queries with CTEs that read top-to-bottom.
A JOIN combines rows from two tables that share a key.
| JOIN type | Returns |
| INNER | only matching rows in both tables |
| LEFT | all rows from the left, NULLs where no match |
| RIGHT | mirror of LEFT (rarely used) |
| FULL OUTER | everything from both, NULLs where no match |
| CROSS | every left row × every right row (Cartesian) |
SELECT u.name, COUNT(o.id) AS orders
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id, u.name;The most common mistake: using INNER JOIN and silently losing users with zero orders. Use LEFT JOIN when one side might be missing.
1. COUNT(*) after a LEFT JOIN. It counts NULL rows too — use COUNT(o.id) to count only matched rows.
2. Joining on a non-indexed column. A 1M×1M join without indexes can run for minutes.
3. Putting the filter in WHERE after a LEFT JOIN. WHERE o.status = 'paid' removes the NULL rows the JOIN preserved — effectively turns it into INNER. Move the condition into the ON clause.
4. Repeating expensive subqueries. Promote them to a CTE so they run once and read clearly.
5. SELECT DISTINCT to fix duplicate rows. It's almost always covering a wrong JOIN.
6. Functions on indexed columns in WHERE. WHERE LOWER(email) = ... skips the index. Store data lower-cased, or add a functional index.
A subquery is a query inside another query. A CTE (Common Table Expression) is a named subquery that reads top-to-bottom.
-- Inline subquery
SELECT * FROM posts
WHERE views > (SELECT AVG(views) FROM posts);-- CTE — same logic, easier to extend
WITH avg_views AS (
SELECT AVG(views) AS v FROM posts
)
SELECT p.*
FROM posts p, avg_views
WHERE p.views > avg_views.v;
Reach for CTEs once you have more than one subquery. They name the intermediate steps so the query reads like a paragraph.
A correlated subquery references the outer row — it runs *once per row*.
SELECT name, salary
FROM employees e
WHERE salary > (
SELECT AVG(salary) FROM employees WHERE department = e.department
);Readable, but slow on big tables. The same answer with a window function is usually 10× faster (next section).
-- IN — builds the full list, then matches
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders);-- EXISTS — stops as soon as it finds a match per row
SELECT * FROM users u WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);
For large subqueries EXISTS is usually faster, and unlike IN it handles NULLs correctly.
SELECT id, 'order' AS source FROM orders
UNION ALL
SELECT id, 'refund' FROM refunds;UNION removes duplicates (extra sort cost). UNION ALL keeps duplicates and is faster — use it whenever you know there are no overlaps.
Window functions compute across a set of rows without collapsing them. You keep every row and add a calculated column.
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept,
AVG(salary) OVER (PARTITION BY department) AS dept_avg,
salary - AVG(salary) OVER (PARTITION BY department) AS diff_from_avg,
LAG(salary) OVER (PARTITION BY department ORDER BY hired_at) AS prev_hire_salary,
SUM(salary) OVER (ORDER BY hired_at) AS running_total
FROM employees;Key functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE(4), LAG, LEAD, FIRST_VALUE, LAST_VALUE, SUM/AVG/COUNT OVER (...). Use cases: top-N per group, running totals, moving averages, deduplication, period-over-period.
Chain CTEs to build a query in stages:
WITH paid AS (
SELECT * FROM orders WHERE status = 'paid'
), monthly AS (
SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
FROM paid
GROUP BY 1
)
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month,
revenue - LAG(revenue) OVER (ORDER BY month) AS delta
FROM monthly
ORDER BY month;Recursive CTEs walk hierarchies (org charts, comment threads, category trees). Powerful but rare — worth knowing exists.
The single most valuable tuning skill.
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at DESC);EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 42 ORDER BY created_at DESC LIMIT 10;
What to look for in the plan:
(user_id, created_at) helps queries on user_id alone, but not created_at alone).1. Filter early — push WHERE before JOIN when possible.
2. Select only the columns you need — avoid SELECT * in production.
3. Index columns used in WHERE, JOIN, ORDER BY.
4. Avoid functions on indexed columns (LOWER(email), DATE(col)) — they break index use.
5. Prefer EXISTS over IN for big subqueries; UNION ALL over UNION.
6. For dashboards, materialise heavy aggregates into a materialised view that refreshes nightly.
A materialised view stores the *result* of a query, not just the definition:
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT DATE_TRUNC('month', created_at) AS month, SUM(amount) AS revenue
FROM orders WHERE status = 'paid'
GROUP BY 1;REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;
Great for dashboards that hit the same heavy query thousands of times.
1. Take users + orders + products and write a 3-table JOIN that returns each user's most-bought product.
2. Replace a correlated subquery with a window function and benchmark both with EXPLAIN ANALYZE.
3. Build a CTE chain that produces monthly revenue + month-over-month % change using LAG.
4. Add the right index to make a slow dashboard query flip from Seq Scan to Index Scan.