Notifications

No notifications

/Phase 2

GROUP BY & HAVING

GROUP BY is how you slice data into buckets and run an aggregate per bucket — sales per country, posts per user, errors per hour. HAVING is the filter that runs *after* aggregation. Together they turn a raw table into a report.

On this page

Detailed Theory

The mental model

Picture the engine doing three steps:

1. WHERE filters individual rows (no aggregates allowed). 2. GROUP BY partitions surviving rows into buckets by the listed columns. 3. HAVING filters whole buckets using aggregates.

SELECT   country, COUNT(*) AS users
FROM     users
WHERE    is_active = TRUE          -- step 1: filter rows
GROUP BY country                    -- step 2: bucket
HAVING   COUNT(*) >= 10             -- step 3: filter buckets
ORDER BY users DESC;

Reads as: "Among active users, group by country, keep only countries with at least 10, sort by user count."

The Golden Rule

In a query with GROUP BY, every column in the SELECT list must either:

  • appear in GROUP BY, or
  • be wrapped in an aggregate function.
-- ❌ illegal: name is neither grouped nor aggregated
SELECT country, name, COUNT(*) FROM users GROUP BY country;

-- ✅ legal SELECT country, COUNT(*) FROM users GROUP BY country;

-- ✅ also legal — aggregate non-grouped columns SELECT country, MAX(name) AS sample_name, COUNT(*) FROM users GROUP BY country;

(MySQL with ONLY_FULL_GROUP_BY=OFF would silently pick a "random" name. That mode is a footgun — leave ONLY_FULL_GROUP_BY on.)

GROUP BY multiple columns

SELECT   country, plan, COUNT(*) AS users
FROM     users
GROUP BY country, plan;

You get one row per distinct combination of (country, plan). Order of columns in GROUP BY doesn't change the result.

GROUP BY expressions

You can group by computed values, not just columns:

SELECT   DATE_TRUNC('day', created_at) AS day, COUNT(*) AS signups
FROM     users
GROUP BY DATE_TRUNC('day', created_at)
ORDER BY day;

-- Grouping by an alias works in MySQL/Postgres but not standard SQL. -- Safer: repeat the expression, or wrap in a subquery.

WHERE vs HAVING — the killer distinction

WHEREHAVING
Runs before aggregation?✅ yes❌ no
Can reference aggregates?❌ no✅ yes
Can reference plain columns?✅ yes✅ if grouped

-- "Countries with average age > 30, only counting verified users"
SELECT   country, AVG(age) AS avg_age
FROM     users
WHERE    is_verified = TRUE   -- filter rows first
GROUP BY country
HAVING   AVG(age) > 30;       -- filter buckets after

Putting AVG(age) > 30 in WHERE is a syntax error. Putting is_verified = TRUE in HAVING is legal but throws away the chance to reduce rows early — bad for performance.

Rule of thumb: filter as early as possible. Use WHERE whenever it's possible; only use HAVING when you genuinely need an aggregate.

NULL is its own group

GROUP BY country puts every row whose country IS NULL into one bucket together — NULL = NULL holds only for grouping purposes. Useful but easy to forget when reading reports.

ROLLUP, CUBE, GROUPING SETS — subtotals in one query

When a single query needs both detail and totals, use grouping extensions instead of UNION:

-- ROLLUP: subtotals along a hierarchy
SELECT   country, plan, COUNT(*) AS users
FROM     users
GROUP BY ROLLUP (country, plan);
-- → per (country, plan), per country (plan = NULL), grand total (both NULL)

-- CUBE: every combination of grouping columns SELECT country, plan, COUNT(*) AS users FROM users GROUP BY CUBE (country, plan);

-- GROUPING SETS: arbitrary list SELECT country, plan, COUNT(*) AS users FROM users GROUP BY GROUPING SETS ((country, plan), (country), ());

The synthetic "totals" rows have NULL in the rolled-up columns. Use the GROUPING(col) function (Postgres / SQL Server / Oracle) to tell a real NULL apart from a "this row is a subtotal" NULL.

DISTINCT vs GROUP BY

For a single column with no aggregates, both produce the same result:

SELECT DISTINCT country FROM users;
SELECT country FROM users GROUP BY country;

Use DISTINCT for "give me unique rows" and GROUP BY whenever an aggregate is involved.

Common pitfalls

1. Selecting an ungrouped, non-aggregated column → error in standard SQL. 2. Filtering an aggregate in WHERE → error. Move it to HAVING. 3. Forgetting that NULLs group together. 4. Grouping by an alias — works in MySQL/Postgres only. Repeat the expression for portability. 5. Counting wrong after a LEFT JOINCOUNT(*) counts join rows (including the NULL "no match" row); COUNT(other_table.id) only counts real matches.

-- ✅ Posts per user — including users with zero posts
SELECT   u.name, COUNT(p.id) AS posts
FROM     users u
LEFT JOIN posts p ON p.user_id = u.id
GROUP BY u.id, u.name;

COUNT(p.id) correctly returns 0 for users with no posts; COUNT(*) would return 1.