Notifications

No notifications

/Phase 3

Constraints & Normalization

Constraints are rules the database enforces on every row, every time. They turn 'we hope nobody ever inserts garbage' into 'the database refuses garbage at the door'. Combined with normalization, they're the difference between a real database and a glorified spreadsheet.

On this page

Detailed Theory

Why constraints?

Application code is fragile. Bugs, race conditions, ad-hoc DBA scripts, third-party imports — anything that can write to your DB *will* eventually try to write something invalid. Constraints make those writes fail loudly instead of silently corrupting data.

The six classic constraints:

ConstraintGuarantees
PRIMARY KEYunique + not null; identifies the row
FOREIGN KEYvalue exists in another table's PK/UK
UNIQUEno duplicates (NULLs usually allowed)
NOT NULLcolumn always has a value
CHECKrow satisfies a boolean expression
DEFAULT(technically not a constraint) supplies a value if missing

PRIMARY KEY

CREATE TABLE users (
  id    BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL UNIQUE
);

A PK is NOT NULL + UNIQUE rolled into one. A table can have at most one PK, but the PK can span multiple columns (composite key):

CREATE TABLE post_tags (
  post_id BIGINT NOT NULL,
  tag_id  BIGINT NOT NULL,
  PRIMARY KEY (post_id, tag_id)
);

This makes (post_id, tag_id) unique together — perfect for a join table. Order in the PK matters for index lookups.

Surrogate vs natural keys: prefer a surrogate (BIGSERIAL / UUID) over a "natural" candidate (email, SSN). Natural keys change; surrogate keys don't.

FOREIGN KEY — referential integrity

CREATE TABLE posts (
  id      BIGSERIAL PRIMARY KEY,
  user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  title   TEXT   NOT NULL
);

Now the DB refuses to insert a post whose user_id doesn't exist in users, and refuses to delete a user with posts (unless an action says otherwise).

ON DELETE / ON UPDATEWhat happens to the child
NO ACTION (default)block the parent change
RESTRICTsame idea, checked immediately
CASCADEapply the change to children too (delete or update PK)
SET NULLchild FK column becomes NULL (must be nullable)
SET DEFAULTchild FK column becomes its DEFAULT

Choose deliberately:

  • CASCADE for "child only exists because of parent" (post comments → comments).
  • SET NULL for "child can survive without parent" (employee → manager).
  • RESTRICT for important parents you must delete on purpose.

UNIQUE

ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);

Or composite:

UNIQUE (org_id, email)   -- email unique per organization

⚠️ NULLs in a UNIQUE column are typically allowed and don't clash with each other (in Postgres / Oracle / SQLite). MySQL/InnoDB also allows multiple NULLs. If you need "at most one row where col is NULL", use a partial unique index.

NOT NULL — defensive default

A surprising amount of bug-hunting time is "why is this NULL?". Mark every column NOT NULL unless absent-value really makes sense for that column. Pair with DEFAULT so existing inserts keep working.

CHECK — domain rules

CREATE TABLE users (
  age      INT  CHECK (age >= 0 AND age <= 150),
  status   TEXT CHECK (status IN ('active','suspended','banned')),
  email    TEXT CHECK (position('@' in email) > 0)
);

CHECK runs per row. It can reference other columns of the same row (CHECK (start_date < end_date)) but not other tables — for cross-table rules, use FKs or triggers.

DEFAULT

Strictly an attribute, not a constraint, but inseparable in practice:

created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
status     TEXT        NOT NULL DEFAULT 'draft',
ext_id     UUID        NOT NULL DEFAULT gen_random_uuid()

Naming constraints

CONSTRAINT posts_user_fk      FOREIGN KEY (user_id) REFERENCES users(id),
CONSTRAINT users_email_unique UNIQUE (email),
CONSTRAINT users_age_chk      CHECK (age >= 0)

Auto-generated names like users_email_key are ugly and shift between engines. Naming them yourself makes ALTER TABLE … DROP CONSTRAINT name predictable.

Deferrable constraints (Postgres)

FOREIGN KEY (user_id) REFERENCES users(id)
DEFERRABLE INITIALLY DEFERRED

Lets you violate the FK *temporarily* inside a transaction (e.g. inserting parent and child rows that reference each other) — checked at COMMIT.

Normalization — the 60-second tour

Normalization removes duplication that causes update anomalies. The first three "forms" cover almost every real schema:

1NF — atomic columns, no repeating groups

❌ users(id, name, phones)            -- "9999, 8888" in a single column
✅ users(id, name) + phones(user_id, phone)

2NF — every non-key column depends on the *whole* key (only matters with composite PKs)

❌ order_items(order_id, product_id, product_name, qty)
   product_name depends only on product_id, not the whole PK
✅ order_items(order_id, product_id, qty) + products(id, name)

3NF — no transitive dependencies (non-key → non-key)

❌ employees(id, dept_id, dept_name)
   dept_name depends on dept_id, not on employee
✅ employees(id, dept_id) + departments(id, name)

Heuristic: one fact per table. If you find yourself updating the same value in many rows, you have a normalization problem.

When to denormalize

Normalization is the default; denormalize on purpose for read-heavy hot paths:

  • pre-computed counters (users.post_count)
  • materialized views
  • duplicated lookup columns (orders.user_country) for analytics queries
The cost is "now you have to keep two copies in sync" — usually via triggers or background jobs. Don't pay that cost until profiling forces you to.