.github/workflows/migrate.yml
Schema changes without downtime
Every production PostgreSQL database eventually needs a column renamed, a type changed, or an index added to a 200-million-row table. A naive ALTER TABLE ... ADD COLUMN ... DEFAULT ... on PostgreSQL 10 or earlier acquires an ACCESS EXCLUSIVE lock. Even on PostgreSQL 11+, where non-volatile defaults avoid full table rewrites, migrations involving backfills, constraint changes, or index creation still risk locking out reads and writes.
The migration strategy matters more than the migration tool. Most teams get this backwards: they pick a tool before understanding the pattern.
Pattern 1: expand/contract
The expand/contract pattern splits a breaking schema change into three deployable phases:
Phase 1: expand (add new)
-- Non-blocking: adds column with no default on PG 11+
ALTER TABLE orders ADD COLUMN customer_email TEXT;
Phase 2: migrate (backfill)
-- Batched backfill to avoid long transactions
UPDATE orders
SET customer_email = customers.email
FROM customers
WHERE orders.customer_id = customers.id
AND orders.id BETWEEN :start AND :end;
Phase 3: contract (drop old)
-- Only after all application code stops reading the old column
ALTER TABLE orders DROP COLUMN customer_name;
Each phase ships as a separate deployment. Application code uses dual-write during the migration window, reading from the new column with a fallback to the old.
Safe concurrent indexing
-- NEVER inside a transaction block
CREATE INDEX CONCURRENTLY idx_orders_customer_email
ON orders (customer_email);
CREATE INDEX CONCURRENTLY builds the index without holding a write lock, but it cannot run inside a transaction. Your migration runner must support non-transactional statements. Flyway (executeInTransaction=false) and Liquibase (runInTransaction="false") both handle this.
Pattern 2: blue-green at the database level
Blue-green for Postgres schema changes uses a shadow schema and view switching rather than in-place changes.
Step 1: create shadow schema
CREATE SCHEMA green;
CREATE TABLE green.orders (
id BIGINT PRIMARY KEY,
customer_email TEXT NOT NULL,
amount NUMERIC(12,2),
created_at TIMESTAMPTZ DEFAULT now()
);
Step 2: backfill via logical replication or trigger-based copy
-- Trigger-based sync (the PG equivalent of gh-ost)
CREATE OR REPLACE FUNCTION sync_orders() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO green.orders (id, customer_email, amount, created_at)
VALUES (NEW.id, NEW.customer_email, NEW.amount, NEW.created_at)
ON CONFLICT (id) DO UPDATE SET
customer_email = EXCLUDED.customer_email,
amount = EXCLUDED.amount;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
Step 3: atomic view switch
-- Swap the live schema atomically
CREATE OR REPLACE VIEW public.orders AS SELECT * FROM green.orders;
This is the PostgreSQL version of the ghost table pattern that pt-online-schema-change and gh-ost popularized in MySQL. Tools like pgroll and pg-osc automate it for PostgreSQL.
Advisory locks for concurrent migration safety
Regardless of pattern, concurrent migrations from multiple CI runners can corrupt state. PostgreSQL advisory locks fix this:
-- Acquire an advisory lock before migrating
SELECT pg_advisory_lock(12345);
-- Run migrations...
-- Release
SELECT pg_advisory_unlock(12345);
In Kotlin with JDBC:
fun <T> withMigrationLock(dataSource: DataSource, block: () -> T): T {
dataSource.connection.use { conn ->
conn.prepareStatement("SELECT pg_advisory_lock(12345)").execute()
try {
return block()
} finally {
conn.prepareStatement("SELECT pg_advisory_unlock(12345)").execute()
}
}
}
// Usage in CI/CD
withMigrationLock(dataSource) {
flyway.migrate()
}
This guarantees only one migration runner executes at a time, even across multiple pods or pipeline jobs.
Comparison
| Criteria | Expand/contract | Blue-green (shadow schema) |
|---|---|---|
| Complexity | Low to medium | High |
| Rollback | Drop new column | Switch view back |
| Downtime risk | Near zero with batched backfills | Zero with atomic view swap |
| Storage overhead | Minimal (new column only) | 2x table size during sync |
| Tool support | Flyway, Liquibase, Alembic | pgroll, pg-osc, custom triggers |
| Lock duration | Milliseconds per DDL | Milliseconds at view swap |
| Best for | Additive changes, column renames | Type changes, large restructures |
| CI/CD integration | Simple phase gates | Requires sync validation step |
| Team skill required | Moderate SQL knowledge | Deep PostgreSQL internals |
Expand/contract handles 80% of migration scenarios with far less operational overhead. Blue-green earns its complexity when you need atomic cutover for structural changes that can’t be broken into additive steps.
CI/CD pipeline integration
A migration safety gate in your pipeline prevents deploying application code before the schema is ready:
# .github/workflows/migrate.yml
jobs:
migrate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Acquire advisory lock and migrate
run: |
psql "$DATABASE_URL" -c "SELECT pg_advisory_lock(12345);"
flyway -url="$JDBC_URL" migrate
psql "$DATABASE_URL" -c "SELECT pg_advisory_unlock(12345);"
- name: Validate schema
run: |
psql "$DATABASE_URL" -c "
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'orders'
AND column_name = 'customer_email';" \
| grep -q 'customer_email' || exit 1
deploy:
needs: migrate
runs-on: ubuntu-latest
steps:
- name: Deploy application
run: kubectl rollout restart deployment/api
The deploy job depends on migrate, so schema changes land before code that references new columns. The validation step is a gate: if the expected column doesn’t exist, the pipeline fails before deployment.
What to actually do
Default to expand/contract for additive changes like column additions, index creation, and nullable-to-non-nullable conversions. It’s simpler, uses less storage, and works with standard migration tools.
Save blue-green (shadow schema) for structural rewrites: column type changes across large tables, primary key modifications, table splits. The 2x storage cost and operational complexity are only worth it when incremental steps aren’t feasible.
Use pg_advisory_lock and CI/CD gates no matter which pattern you pick. Concurrent migrations are a silent data corruption vector. Lock acquisition should be the first line of every migration job, and schema validation should gate every deployment. I’ve seen teams skip this and lose data. Don’t be that team.
TAGS: backend, architecture, devops, cicd, cloud