Zero downtime schema migrations in distributed DBs
Meta description: How CockroachDB, YugabyteDB, TiDB, and Spanner handle online DDL vs PostgreSQL. Strategies for zero downtime schema migrations on sharded clusters.
TL;DR
Single-node PostgreSQL takes aggressive locks during schema changes that block reads and writes. Distributed databases like CockroachDB, YugabyteDB, TiDB, and Spanner implement online DDL, meaning schema changes propagate across nodes without downtime. But each system handles it differently, with real trade-offs in performance, compatibility, and operational complexity. This is what you need to know before running ALTER TABLE on a 50-node cluster.
The problem: ALTER TABLE on a sharded cluster
The scariest moment in any release isn’t the code deploy. It’s the schema migration. On single-node PostgreSQL, an ALTER TABLE ... ADD COLUMN with a default value acquired an ACCESS EXCLUSIVE lock until PostgreSQL 11 introduced heap rewrites avoidance for columns with non-volatile defaults. Even today, operations like adding a NOT NULL constraint with validation or creating an index without CONCURRENTLY will block concurrent queries.
Now multiply that across a distributed cluster. Dozens of nodes, each holding a subset of data, all needing to agree on what the schema looks like. A naive approach (lock the entire table globally) means seconds or minutes of downtime on large tables. Unacceptable for systems targeting five-nines availability.
How distributed databases solve online DDL
The core insight is schema versioning. Instead of a single atomic flip, distributed databases introduce the new schema gradually, allowing nodes to temporarily operate on different schema versions simultaneously.
Google Spanner’s schema change protocol ensures that at any point in time, no more than two schema versions are active across the cluster. CockroachDB adopted a similar model based on the Spanner paper, implementing a multi-phase state machine for schema changes.
A typical online ADD COLUMN looks like this:
Phase 1: DELETE-ONLY → New column exists but only delete operations consider it
Phase 2: DELETE-AND-WRITE-ONLY → New column accepts writes but reads ignore it
Phase 3: PUBLIC → Column is fully visible to all operations
Each phase propagates across nodes with a lease-based protocol. Nodes that haven’t received the update yet continue operating on the prior version, and the protocol guarantees correctness across both versions.
Online DDL support compared
| Feature | PostgreSQL | CockroachDB | YugabyteDB | TiDB | Spanner |
|---|---|---|---|---|---|
| Add column | Lock required (fast since v11) | Online | Online | Online | Online |
| Add index | CONCURRENTLY option | Online (backfill) | Online | Online (internal backfill worker)¹ | Online |
| Drop column | ACCESS EXCLUSIVE lock | Online, multi-phase | Online | Online | Online |
| Add NOT NULL constraint | Full table scan + lock | Online validation | Online | Online | Online |
| Change column type | Rewrites table + lock | Limited support | Limited support | Supported via ALTER | Limited² |
| Schema versioning | None (single node) | Spanner-based protocol | Raft-based propagation | Schema lease + owner model | Native multi-version |
| Max concurrent DDLs | Limited by lock contention | Multiple (with caveats) | Multiple | Single DDL owner per table | Multiple |
¹ TiDB uses an internal DDL worker (not to be confused with TiDB Data Migration/DM, which handles external replication). See TiDB DDL documentation.
² Spanner supports some in-place type promotions (e.g., widening
STRINGlength) without table rewrites, but does not support arbitrary type changes in-place.
Look at that table. Every distributed database has invested in making DDL non-blocking, yet column type changes remain painful across nearly every system. That’s where you’ll still feel the sting.
Migration strategies that actually work
1. Expand-contract pattern
The most battle-tested pattern across all distributed databases:
-- Phase 1: Expand -- add the new column
ALTER TABLE orders ADD COLUMN status_v2 INT;
-- Phase 2: Dual-write -- application writes to both columns
-- Phase 3: Backfill -- migrate existing data
UPDATE orders SET status_v2 = status WHERE status_v2 IS NULL;
-- Phase 4: Contract -- drop the old column
ALTER TABLE orders DROP COLUMN status;
This decouples the schema change from the application deploy. That matters when your cluster nodes might be running different binary versions during a rolling update.
2. Declarative schema management
Tools like atlas or CockroachDB’s built-in schema change jobs let you declare the desired state and let the system figure out the migration path. In my experience, this scales better than hand-written migration files once you’re past 100+ migrations. The upfront investment pays off.
3. Guard against long-running backfills
On a 500GB table distributed across 30 nodes, an index backfill can take hours. Both CockroachDB and TiDB support throttling backfills to limit impact on foreground traffic. Always configure this. I’ve seen unthrottled backfills consume enough I/O to cause latency spikes that trigger cascading timeouts.
What most teams get wrong
They treat distributed DDL like single-node DDL. Write a migration, run it in CI, assume it’s safe. But distributed DDL introduces failure modes that don’t exist on a single node.
Node restarts mid-migration can leave schema changes in an intermediate state. In CockroachDB, schema change jobs are resumable. If a node crashes, check and resume the job:
-- CockroachDB: inspect and resume a failed schema change
SHOW JOBS WHEN COMPLETE (SELECT job_id FROM crdb_internal.jobs WHERE job_type = 'SCHEMA CHANGE' AND status = 'reverting');
RESUME JOB <job_id>;
Stuck or conflicting DDLs are common in TiDB when concurrent DDLs target the same table. Use the admin command to identify and cancel:
-- TiDB: view and cancel a stuck DDL
ADMIN SHOW DDL JOBS;
ADMIN CANCEL DDL JOBS job_id;
Clock skew (especially on non-Spanner systems without TrueTime) can cause version propagation delays. Monitor schema lease durations and set alerts when lease renewal latency exceeds your configured threshold.
One hard rule: always test migrations against a multi-node staging cluster that simulates failure injection (kill a node mid-backfill, introduce network partitions). A single-node dev instance tells you almost nothing about how a migration will behave in production.
What to do with all this
Use the expand-contract pattern for any non-trivial schema change. It works universally across PostgreSQL and every distributed database, and it cleanly separates schema changes from application deploys. There’s a reason everyone reaches for it.
Build a DDL compatibility matrix for your specific database version. Online DDL support evolves fast. CockroachDB v23.1 introduced declarative schema changer as the default, and TiDB v6.2 added concurrent DDL execution. What required downtime two versions ago may be fully online today. Check before you assume.
Throttle backfills, instrument migrations, and rehearse failure recovery. Set explicit rate limits on index builds, emit metrics from your migration tooling, and practice the RESUME JOB / ADMIN CANCEL DDL workflows before you need them at 2 AM. You don’t want your first time running these commands to be during an incident.
Tags: distributed-databases, schema-migration, cockroachdb, postgresql, devops