Notifications

No notifications

/Phase 4

Advanced Database Patterns

Production Database Patterns

1. Connection Pooling

Don't create a new connection per request — reuse from a pool.

// PostgreSQL
const pool = new Pool({ max: 20 });
const { rows } = await pool.query('SELECT * FROM users');

// MongoDB / Mongoose mongoose.connect(uri, { maxPoolSize: 10, minPoolSize: 2, serverSelectionTimeoutMS: 5000, });

2. Indexing Strategy

Index TypeUse Case
Single fieldFrequent lookups (email, username)
CompoundMulti-field queries (author_id + created_at)
UniqueEnforce uniqueness (email)
Text/Full-textSearch in text content
TTL (MongoDB)Auto-expire documents (sessions, logs)

3. Data Modeling Patterns

One-to-One   →  Embed or same table
One-to-Many  →  FK (SQL) or Embed/Ref (MongoDB)
Many-to-Many →  Junction table (SQL) or Array of refs (MongoDB)

4. Migration Pattern

// Versioned migrations (like knex or prisma)
// 001_create_users.js
exports.up = (knex) =>
  knex.schema.createTable('users', t => {
    t.increments('id');
    t.string('email').unique();
  });

exports.down = (knex) => knex.schema.dropTable('users');

On this page

Detailed Theory

Once your app has real users you stop thinking about *one* query at a time and start thinking about load, money, and failure. Advanced database work is mostly the answer to three questions: *Will it stay fast as data grows? Will it stay correct under concurrency? Will it survive a server dying?*

What "Advanced" Actually Means

Four skills separate hobby projects from production systems:

1. Reading query plans — you know *why* a query is slow, not just *that* it is slow. 2. Designing for concurrency — transactions, isolation, locking. 3. Adding caching layers — Redis in front of your DB. 4. Operating safely — backups, migrations, replication, monitoring.

Every section below maps to one of these.

Beginner Mistakes to Skip

1. Adding indexes "just in case". Each one slows writes and uses space. Add them when EXPLAIN proves you need them. 2. Caching everything by default. A stale cache hides bugs. Cache hot, slow, idempotent reads — not user-specific writes. 3. Hard-deleting data. Use soft deletes (deleted_at) until you are sure. Real-world support tickets need it. 4. No backups, only replication. Replication copies your bugs and bad migrations instantly. Take separate point-in-time backups. 5. Schema changes during traffic. ALTER TABLE on a 50M-row table can lock for hours. Use migration tools that do it online. 6. Believing "the DB will scale itself". It will not. You will scale it.

Intermediate: Reading EXPLAIN ANALYZE

The single most valuable skill in DB tuning. Postgres example:

EXPLAIN ANALYZE
SELECT * FROM posts WHERE author_id = 42 ORDER BY created_at DESC LIMIT 20;

What to look for:

  • Seq Scan on a big table → missing index.
  • Index Scan → good.
  • Rows Removed by Filter: 1,000,000 → wrong index, full scan happening anyway.
  • Sort with high actual time → add an index that already gives the order.
Mongo equivalent: db.posts.find(...).explain("executionStats"). Look for COLLSCAN (bad) vs IXSCAN (good).

Intermediate: N+1 Queries (the Silent Killer)

The code looks fine, the page takes 8 seconds:

const posts = await Post.findAll();           // 1 query
for (const p of posts) {
  p.author = await User.findById(p.authorId); // N more queries!
}

Fixes: SQL JOIN, Mongo $lookup / populate, Sequelize include, Prisma include, or DataLoader (batches + caches per request). One round-trip instead of 101.

Intermediate: Aggregation Pipelines

Push computation into the database. The CPU is closer to the data.

const stats = await Order.aggregate([
  { $match:  { createdAt: { $gte: lastMonth } } },
  { $group:  { _id: '$status', count: { $sum: 1 }, revenue: { $sum: '$total' } } },
  { $sort:   { revenue: -1 } },
  { $project:{ status: '$_id', count: 1, revenue: { $round: ['$revenue', 2] }, _id: 0 } },
]);

SQL counterpart = GROUP BY + window functions + CTEs. Either way: do the math in the DB, not in Node.

Intermediate: Caching with Redis

The canonical "cache-aside" pattern:

async function getUser(id) {
  const key = user:${id};
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

const user = await User.findById(id); if (user) await redis.setex(key, 300, JSON.stringify(user)); // 5 min TTL return user; }

async function updateUser(id, data) { const user = await User.findByIdAndUpdate(id, data, { new: true }); await redis.del(user:${id}); // invalidate return user; }

The famous quote applies: "there are only two hard things in CS — cache invalidation and naming things." Always set a TTL so a missed invalidation does not poison data forever.

Intermediate: Pagination at Scale

Offset (LIMIT 20 OFFSET 200000) gets slower the deeper you go — the DB still scans everything skipped. Cursor pagination (WHERE id > $cursor LIMIT 20) stays O(log n) forever. Use offset for admin tables, cursor for infinite scroll / public APIs.

Advanced: Transaction Isolation Levels

The DB has to decide what concurrent transactions can see.

LevelPreventsAllows
Read Uncommittednothingdirty reads
Read Committed *(Postgres default)*dirty readsnon-repeatable + phantom reads
Repeatable Read *(MySQL default)*+ non-repeatable readsphantom reads (mostly)
Serializableeverythinglower throughput

For money or inventory you usually want Serializable, or you use explicit locking:

SELECT stock FROM products WHERE id = 1 FOR UPDATE; -- locks the row

Advanced: Optimistic vs Pessimistic Concurrency

  • Pessimistic: lock the row up front (FOR UPDATE). Safe, can deadlock, hurts throughput.
  • Optimistic: read with a version column, write with WHERE version = $oldVersion. If 0 rows updated → retry.
Optimistic wins for high-read, low-conflict workloads (most web apps).

Advanced: Replication, Read Replicas & Sharding

  • Primary → replica streaming lets you offload reads. Watch for replication lag — a write you just made may not be on the replica yet.
  • Sharding splits data across machines by key (user_id mod N). You scale write throughput, but JOINs across shards become hard. Avoid until you are sure you need it.
  • Managed services (Aurora, Atlas, PlanetScale) hide most of this complexity.

Advanced: Soft Deletes & Audit Trails

Most real apps never DELETE:

// Mongoose
userSchema.pre(/^find/, function() { this.where({ deletedAt: null }); });
userSchema.methods.softDelete = function() { this.deletedAt = new Date(); return this.save(); };

For regulated domains (finance, healthcare) add an audit table — every change appended with who/when/what. Combined with PITR backups, you can answer "what did the row look like at 3:42 PM yesterday?".

Advanced: Online Migrations

Never block production traffic. The safe recipe:

1. Expand — add the new column nullable, deploy. 2. Dual-write — code writes both old and new. 3. Backfill — batch script copies old values into new column. 4. Switch reads to new column. 5. Contract — drop the old column.

Tools: Prisma Migrate, Knex, Liquibase, gh-ost, pt-online-schema-change.

Advanced: Backups & Disaster Recovery

A backup you have never restored is not a backup. Minimums:

  • Daily snapshots + point-in-time recovery (WAL / oplog).
  • Off-region copy — region outages happen.
  • Quarterly restore drill to a fresh instance, with timing.
  • RPO (how much data can you lose?) and RTO (how fast must you be back?) — know both numbers before an incident.

Advanced: Observability for Databases

Monitor (Prometheus / Datadog / pg_stat_statements / Mongo Atlas):

  • Query P50 / P95 / P99 latency.
  • Slow query log (anything > 1s).
  • Connection count vs pool size.
  • Cache hit ratio.
  • Replication lag in milliseconds.
  • Disk I/O and free space.
Alert before users notice. Logs tell you what happened; metrics tell you what is happening *now*.

Practice Path

1. Add EXPLAIN ANALYZE to a slow query, add the right index, prove the plan flips from Seq Scan to Index Scan. 2. Build a cache-aside helper around getUser(id) with a 5-minute TTL and invalidate on update. 3. Convert an offset-paginated endpoint to cursor pagination and benchmark page 10000. 4. Implement an optimistic-locking update with a version column and retry on conflict.