Last 30 Days
No notifications
A subquery is a query inside another query. They let you compose questions: 'find users older than the average', 'show posts by users from India', 'list customers who never ordered'. Add set operations (UNION, INTERSECT, EXCEPT) and you can stitch result sets together like sets of rows.
| Position | Looks like | Returns |
In SELECT | scalar subquery | exactly one value |
In FROM | derived table | a virtual table |
In WHERE | predicate subquery | one or many values, used by IN/EXISTS/comparisons |
-- 1. Scalar subquery — returns ONE value
SELECT name, age, (SELECT AVG(age) FROM users) AS company_avg
FROM users;-- 2. Derived table — used like any other table
SELECT t.country, t.users
FROM (SELECT country, COUNT(*) AS users FROM users GROUP BY country) AS t
WHERE t.users >= 10;
-- 3. Predicate subquery — feeds an IN / EXISTS / comparison
SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts WHERE published);
A scalar subquery must return exactly one row and one column, otherwise it errors at runtime. Wrap with LIMIT 1 or aggregates if uncertain:
SELECT name, (SELECT MAX(amount) FROM orders WHERE user_id = u.id) AS biggest_order
FROM users u;Anything you can write as a query, you can wrap in parentheses, alias, and use as a table:
SELECT country, AVG(age_decade) AS avg_decade
FROM (
SELECT country, FLOOR(age/10)*10 AS age_decade FROM users
) AS u
GROUP BY country;The alias is mandatory in most engines. CTEs (next topic) are usually a cleaner alternative when the subquery is reused or deeply nested.
| When to use | Pitfall | |
IN (subquery) | small list / set membership | NULL in list → trouble for NOT IN |
NOT IN (subquery) | row not in set | NOT IN with any NULL returns nothing |
EXISTS (subquery) | "is there at least one match?" | none — preferred for correlated checks |
NOT EXISTS (subquery) | anti-join | none — NULL-safe |
-- All users with at least one post
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);-- All users with NO posts (NULL-safe!)
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);
The famous trap:
-- ⚠️ If posts.user_id has any NULL, this returns ZERO rows
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM posts);Why? x NOT IN (1, 2, NULL) evaluates to x <> 1 AND x <> 2 AND x <> NULL — and x <> NULL is UNKNOWN, never TRUE. The whole AND chain becomes UNKNOWN, the row is dropped. Use NOT EXISTS for safety.
A correlated subquery references a column from the outer query — it re-runs once per outer row:
-- Users older than the average for their country
SELECT name, age, country
FROM users u
WHERE age > (SELECT AVG(age)
FROM users
WHERE country = u.country);Conceptually expensive (one inner run per outer row), but optimisers often rewrite to a join + aggregate. Still, prefer joins or window functions if the dataset is large.
| Scenario | Better with | ||||
| "Does X exist?" | EXISTS (subquery) | ||||
| "Get matching rows + columns from both sides" | JOIN | ||||
| "Compare against an aggregate of another set" | scalar subquery / window | ||||
| "Combine result sets" | UNION / UNION ALL | Set operationsCombine whole result sets that have the same column shape: | Operator | Returns | Removes duplicates? |
UNION | rows in A or B | yes (sorts → slow) | |||
UNION ALL | rows in A or B | no (fast) | |||
INTERSECT | rows in both | yes | |||
EXCEPT (MINUS in Oracle) | rows in A not in B | yes |
-- Active or premium users (deduped)
SELECT id FROM users WHERE is_active
UNION
SELECT id FROM users WHERE is_premium;-- Same, but keep duplicates (much faster on big tables)
SELECT id FROM users WHERE is_active
UNION ALL
SELECT id FROM users WHERE is_premium;
-- Customers who logged in AND placed an order today
SELECT user_id FROM logins WHERE day = CURRENT_DATE
INTERSECT
SELECT user_id FROM orders WHERE day = CURRENT_DATE;
-- Customers who logged in but did NOT order
SELECT user_id FROM logins WHERE day = CURRENT_DATE
EXCEPT
SELECT user_id FROM orders WHERE day = CURRENT_DATE;
Use UNION ALL whenever you don't need dedup. UNION always sorts internally; UNION ALL is a cheap concatenation.
1. Same number of columns in every branch.
2. Compatible types per column position.
3. Column names come from the first branch.
4. ORDER BY / LIMIT belong at the very end, applied to the combined result.
SELECT id, name, 'employee' AS kind FROM employees
UNION ALL
SELECT id, name, 'contractor' AS kind FROM contractors
ORDER BY name;EXISTS.NOT EXISTS or LEFT JOIN … IS NULL.UNION ALL (or UNION if you need dedup).INTERSECT.EXCEPT.