Last 30 Days
No notifications
SQL (Structured Query Language) is the universal language for working with relational databases. Every data analyst role requires strong SQL skills — it's how you extract, filter, and summarize business data at scale.
SELECT column1, column2 -- What columns to show
FROM table_name -- Which table to query
WHERE condition -- Filter rows
GROUP BY column -- Group for aggregation
HAVING aggregate_condition -- Filter groups
ORDER BY column DESC -- Sort results
LIMIT 10; -- Restrict row count| Operator | Example | Purpose | ||||
=, !=, <, > | WHERE salary > 50000 | Basic comparison | ||||
AND / OR / NOT | WHERE dept = 'Sales' AND salary > 60000 | Combine conditions | ||||
BETWEEN | WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31' | Range filter | ||||
IN | WHERE region IN ('North', 'South') | Match any in list | ||||
LIKE | WHERE name LIKE 'A%' | Pattern matching | ||||
IS NULL | WHERE manager_id IS NULL | Check for missing data | Aggregate Functions | Function | Purpose | Example |
COUNT(*) | Count all rows | SELECT COUNT(*) FROM orders | ||||
SUM(col) | Total a numeric column | SELECT SUM(amount) FROM orders | ||||
AVG(col) | Average value | SELECT AVG(salary) FROM employees | ||||
MIN(col) / MAX(col) | Smallest / Largest | SELECT MIN(price), MAX(price) FROM products | ||||
DISTINCT | Remove duplicates | SELECT DISTINCT department FROM employees |
NULLs represent missing data and require special handling:
-- Check for NULL
SELECT * FROM employees WHERE manager_id IS NULL;-- Replace NULL with a default using COALESCE
SELECT name, COALESCE(phone, 'N/A') AS phone FROM employees;
-- COALESCE picks the first non-NULL value
SELECT COALESCE(nickname, first_name, 'Unknown') AS display_name FROM users;
Use AS to rename output columns for readability:
SELECT department AS dept, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;SQL is the language you'll use every day as an analyst. Excel can show you 100 rows; SQL lets you ask questions of 100 million. Once you can write a clean SELECT ... WHERE ... GROUP BY, you can answer most business questions yourself instead of waiting for an engineer.
A database stores data in tables — rows and columns with strict types. SQL is the language you use to ask the database for data. You describe *what* you want, the database figures out *how*.
SELECT name, email
FROM users
WHERE country = 'IN'
ORDER BY created_at DESC
LIMIT 10;Read it like English: "give me name and email from users in India, newest first, just 10".
| Clause | Job |
SELECT | which columns to return |
FROM | which table to read |
WHERE | filter individual rows |
GROUP BY | bucket rows into groups |
ORDER BY | sort the result |
LIMIT | cap the number of rows |
Learn these, in this order. JOINs, subqueries, and window functions all build on top.
You *write* SQL in one order, the database *runs* it in another:
FROM → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMITThis explains the #1 beginner gotcha: a column alias from SELECT can't be used in WHERE because WHERE runs before SELECT. Use HAVING (after aggregation) or repeat the expression.
1. SELECT * in dashboards / production. Wastes bandwidth, breaks when columns are added. List the columns you need.
2. Forgetting WHERE on UPDATE / DELETE. DELETE FROM users; deletes the whole table. Always test with a SELECT first.
3. = NULL. NULL means *unknown*. Use IS NULL / IS NOT NULL.
4. Mixing data types in comparisons. WHERE id = '42' may quietly do a string compare and never use the index.
5. Sorting before filtering. ORDER BY then LIMIT 10 on millions of rows is slow if there's no index. Filter first.
6. Trusting COUNT(*) and COUNT(col) to mean the same thing. COUNT(*) counts rows; COUNT(col) skips NULLs.
WHERE age BETWEEN 18 AND 30 -- inclusive on both ends
WHERE country IN ('IN', 'US', 'UK')
WHERE email LIKE '%@gmail.com' -- % = any chars, _ = one char
WHERE deleted_at IS NULL
WHERE created_at >= '2026-01-01'
AND status = 'active'
AND (role = 'admin' OR role = 'owner')Mix conditions with AND / OR. Parentheses matter — A AND B OR C is ambiguous, wrap groups.
Aggregates squash many rows into one summary row:
SELECT
country,
COUNT(*) AS user_count,
AVG(age) AS avg_age,
MIN(created_at) AS first_signup,
MAX(created_at) AS latest_signup
FROM users
WHERE deleted_at IS NULL
GROUP BY country
HAVING COUNT(*) > 100
ORDER BY user_count DESC;Key rules:
SELECT must be either inside an aggregate or in GROUP BY.WHERE filters rows; HAVING filters groups (after aggregation).COUNT(DISTINCT col) for unique counts.ORDER BY created_at DESC, id DESC
LIMIT 20 OFFSET 40 -- page 3, 20 per pageA secondary tiebreaker (id DESC) makes ordering deterministic when timestamps tie.
DATE_TRUNC('month', created_at) -- first day of that month
EXTRACT(YEAR FROM created_at)
NOW(), CURRENT_DATE, AGE(birthday)
LOWER(email), TRIM(name), CONCAT(a, ' ', b)
ROUND(price, 2), CEIL(x), FLOOR(x)
COALESCE(nickname, name, 'unknown') -- first non-NULLCOALESCE is a lifesaver for handling NULLs in dashboards.
SELECT name,
CASE
WHEN age < 18 THEN 'minor'
WHEN age < 60 THEN 'adult'
ELSE 'senior'
END AS age_group,
CASE WHEN is_paid THEN 'paying' ELSE 'free' END AS plan
FROM users;Works anywhere expressions are allowed — inside SELECT, WHERE, ORDER BY, even SUM(CASE WHEN ... THEN 1 ELSE 0 END) for conditional counts.
Both deduplicate, but they're not interchangeable:
SELECT DISTINCT country FROM users — just unique values.SELECT country, COUNT(*) FROM users GROUP BY country — unique values + per-group math.DISTINCT is shorthand for GROUP BY over every selected column. On big tables, both can be slow without an index.SQL has three truth values: TRUE / FALSE / UNKNOWN.
NULL = NULL → UNKNOWN
NULL <> NULL → UNKNOWN
NULL IS NULL → TRUE
5 + NULL → NULL
NULL OR TRUE → TRUE
NULL AND FALSE → FALSEThis is why WHERE col = NULL always returns 0 rows. Use IS [NOT] NULL and reach for COALESCE to substitute defaults.
INSERT INTO users (name, email) VALUES ('Alice', 'a@t.com');UPDATE users SET status = 'inactive'
WHERE last_login < NOW() - INTERVAL '180 days';
DELETE FROM sessions WHERE expires_at < NOW();
Golden habit: always run SELECT with the same WHERE first. If it returns the rows you intended to change, only then rewrite as UPDATE / DELETE. Wrap in BEGIN; ... COMMIT; so you can ROLLBACK.
1. Pick a table with at least 5 columns (users, orders, anything). Write 5 SELECTs that each use a different clause: WHERE, ORDER BY, LIMIT, GROUP BY, HAVING.
2. Build a "users per country" report and add a HAVING COUNT(*) > 10 filter.
3. Use CASE to bucket users into age groups and count each bucket.
4. Write an UPDATE that flips all expired sessions to status = 'expired', dry-run with SELECT first, run inside a transaction.