Notifications

No notifications

/Phase 2

Aggregate Functions

Aggregates collapse many rows into one summary value — COUNT, SUM, AVG, MIN, MAX and friends. They power every metric, KPI and report. Once you understand how aggregates handle NULLs and how DISTINCT and FILTER modify them, you've unlocked the analytical half of SQL.

On this page

Detailed Theory

The five workhorses

FunctionWhat it returns
COUNT(*)Number of rows (NULL or not)
COUNT(col)Number of rows where col is NOT NULL
COUNT(DISTINCT col)Number of distinct non-NULL values
SUM(col)Sum of non-NULL values
AVG(col)Mean of non-NULL values
MIN(col) / MAX(col)Smallest / largest non-NULL value

SELECT COUNT(*)        AS total_users,
       COUNT(phone)    AS users_with_phone,
       COUNT(DISTINCT country) AS countries,
       AVG(age)        AS avg_age,
       MIN(age)        AS youngest,
       MAX(age)        AS oldest,
       SUM(age)        AS total_age_years
FROM   users;

A query without GROUP BY returns one row that summarises the whole table.

NULLs are silently skipped

Aggregates ignore NULL values (except COUNT(*)):

ages: 10, 20, NULL, 30

ExpressionValue
COUNT(*)4
COUNT(age)3
SUM(age)60
AVG(age)20 (60 / 3, not 4!)

That AVG denominator is the most common surprise. If you want NULLs to count as 0:

AVG(COALESCE(age, 0))

If the table is empty, all aggregates return NULL — except COUNT, which returns 0.

COUNT(*) vs COUNT(1) vs COUNT(col)

COUNT(*)    -- count all rows
COUNT(1)    -- same thing; the 1 is a non-NULL constant
COUNT(col)  -- count only rows where col IS NOT NULL

In every modern engine, COUNT(*) and COUNT(1) are identical in performance. Use COUNT(*) for clarity.

DISTINCT inside aggregates

SELECT COUNT(DISTINCT country) FROM users;   -- # of unique countries
SELECT SUM(DISTINCT amount)    FROM payments; -- sum of unique amounts (rare)

COUNT(DISTINCT) is the workhorse — "how many unique X are there?" — and the basis of cohort analysis.

FILTER (WHERE …) — conditional aggregates

Standard SQL (Postgres + SQLite):

SELECT
  COUNT(*)                                   AS total,
  COUNT(*) FILTER (WHERE status = 'active')  AS active,
  COUNT(*) FILTER (WHERE status = 'banned')  AS banned,
  AVG(age) FILTER (WHERE country = 'IN')     AS avg_age_in
FROM   users;

In MySQL / older engines you fall back to SUM(CASE WHEN …):

SELECT
  COUNT(*)                                                 AS total,
  SUM(CASE WHEN status = 'active' THEN 1 ELSE 0 END)       AS active,
  AVG(CASE WHEN country = 'IN' THEN age END)               AS avg_age_in
FROM   users;

This pattern — many counters in one query — is wildly more efficient than running multiple SELECT COUNTs.

Boolean aggregates

FunctionReturns
BOOL_OR(cond) / BOOL_AND(cond) (Postgres)TRUE if any/all rows match
MAX(CASE WHEN cond THEN 1 ELSE 0 END) = 1portable equivalent

SELECT BOOL_OR(country = 'IN') AS has_indian_user FROM users;

String / array aggregates

FunctionEngineWhat it does
STRING_AGG(col, ',')PostgresConcatenate with separator
GROUP_CONCAT(col SEPARATOR ',')MySQLSame idea
ARRAY_AGG(col)PostgresCollect into an array
JSON_AGG(col)PostgresCollect into a JSON array

SELECT user_id,
       STRING_AGG(tag, ',' ORDER BY tag) AS tags
FROM   user_tags
GROUP  BY user_id;

Statistical aggregates

SELECT
  STDDEV(age)              AS std_dev,
  VAR_SAMP(age)            AS variance,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY age) AS median
FROM users;

Postgres has the richest set; most engines have at least STDDEV / VARIANCE.

Aggregates without GROUP BY

Whenever a query has any aggregate, all non-aggregated columns in the SELECT list must either appear in GROUP BY or themselves be aggregated. This errors:

-- ❌ name is neither grouped nor aggregated
SELECT name, AVG(age) FROM users;

You either group:

SELECT country, AVG(age) FROM users GROUP BY country;

Or pick one with an aggregate:

SELECT MAX(name) AS sample_name, AVG(age) FROM users;

(MySQL allowed the broken form for years with ONLY_FULL_GROUP_BY off — it's a footgun, leave it on.)

Practical recipes

KPI dashboard — one row, many metrics

SELECT
  COUNT(*)                                          AS users_total,
  COUNT(*) FILTER (WHERE created_at >= CURRENT_DATE - INTERVAL '7 days') AS new_7d,
  COUNT(*) FILTER (WHERE last_login >= CURRENT_DATE - INTERVAL '30 days') AS active_30d,
  AVG(age) FILTER (WHERE age IS NOT NULL)           AS avg_age
FROM users;

Convert "is this list non-empty?" without LIMIT 1

SELECT EXISTS (SELECT 1 FROM orders WHERE user_id = 42);

(EXISTS short-circuits — faster than COUNT(*) > 0 on large tables.)

Recap

  • COUNT(*) counts rows; COUNT(col) skips NULLs.
  • Aggregates ignore NULLs; AVG denominator can surprise you.
  • Use FILTER (WHERE …) (or SUM(CASE WHEN…)) to compute many conditional totals in one pass.
  • COUNT(DISTINCT col) is the cohort/uniqueness workhorse.
  • Pair aggregates with GROUP BY (next topic) to slice metrics by category.