MVP Factory
ai startup development

.github/workflows/migrate.yml

KW
Krystian Wiewiór · · 5 min read

Schema changes without downtime

Every production PostgreSQL database eventually needs a column renamed, a type changed, or an index added to a 200-million-row table. A naive ALTER TABLE ... ADD COLUMN ... DEFAULT ... on PostgreSQL 10 or earlier acquires an ACCESS EXCLUSIVE lock. Even on PostgreSQL 11+, where non-volatile defaults avoid full table rewrites, migrations involving backfills, constraint changes, or index creation still risk locking out reads and writes.

The migration strategy matters more than the migration tool. Most teams get this backwards: they pick a tool before understanding the pattern.

Pattern 1: expand/contract

The expand/contract pattern splits a breaking schema change into three deployable phases:

Phase 1: expand (add new)

-- Non-blocking: adds column with no default on PG 11+
ALTER TABLE orders ADD COLUMN customer_email TEXT;

Phase 2: migrate (backfill)

-- Batched backfill to avoid long transactions
UPDATE orders
SET customer_email = customers.email
FROM customers
WHERE orders.customer_id = customers.id
  AND orders.id BETWEEN :start AND :end;

Phase 3: contract (drop old)

-- Only after all application code stops reading the old column
ALTER TABLE orders DROP COLUMN customer_name;

Each phase ships as a separate deployment. Application code uses dual-write during the migration window, reading from the new column with a fallback to the old.

Safe concurrent indexing

-- NEVER inside a transaction block
CREATE INDEX CONCURRENTLY idx_orders_customer_email
ON orders (customer_email);

CREATE INDEX CONCURRENTLY builds the index without holding a write lock, but it cannot run inside a transaction. Your migration runner must support non-transactional statements. Flyway (executeInTransaction=false) and Liquibase (runInTransaction="false") both handle this.

Pattern 2: blue-green at the database level

Blue-green for Postgres schema changes uses a shadow schema and view switching rather than in-place changes.

Step 1: create shadow schema

CREATE SCHEMA green;

CREATE TABLE green.orders (
    id BIGINT PRIMARY KEY,
    customer_email TEXT NOT NULL,
    amount NUMERIC(12,2),
    created_at TIMESTAMPTZ DEFAULT now()
);

Step 2: backfill via logical replication or trigger-based copy

-- Trigger-based sync (the PG equivalent of gh-ost)
CREATE OR REPLACE FUNCTION sync_orders() RETURNS TRIGGER AS $$
BEGIN
    INSERT INTO green.orders (id, customer_email, amount, created_at)
    VALUES (NEW.id, NEW.customer_email, NEW.amount, NEW.created_at)
    ON CONFLICT (id) DO UPDATE SET
        customer_email = EXCLUDED.customer_email,
        amount = EXCLUDED.amount;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Step 3: atomic view switch

-- Swap the live schema atomically
CREATE OR REPLACE VIEW public.orders AS SELECT * FROM green.orders;

This is the PostgreSQL version of the ghost table pattern that pt-online-schema-change and gh-ost popularized in MySQL. Tools like pgroll and pg-osc automate it for PostgreSQL.

Advisory locks for concurrent migration safety

Regardless of pattern, concurrent migrations from multiple CI runners can corrupt state. PostgreSQL advisory locks fix this:

-- Acquire an advisory lock before migrating
SELECT pg_advisory_lock(12345);

-- Run migrations...

-- Release
SELECT pg_advisory_unlock(12345);

In Kotlin with JDBC:

fun <T> withMigrationLock(dataSource: DataSource, block: () -> T): T {
    dataSource.connection.use { conn ->
        conn.prepareStatement("SELECT pg_advisory_lock(12345)").execute()
        try {
            return block()
        } finally {
            conn.prepareStatement("SELECT pg_advisory_unlock(12345)").execute()
        }
    }
}

// Usage in CI/CD
withMigrationLock(dataSource) {
    flyway.migrate()
}

This guarantees only one migration runner executes at a time, even across multiple pods or pipeline jobs.

Comparison

CriteriaExpand/contractBlue-green (shadow schema)
ComplexityLow to mediumHigh
RollbackDrop new columnSwitch view back
Downtime riskNear zero with batched backfillsZero with atomic view swap
Storage overheadMinimal (new column only)2x table size during sync
Tool supportFlyway, Liquibase, Alembicpgroll, pg-osc, custom triggers
Lock durationMilliseconds per DDLMilliseconds at view swap
Best forAdditive changes, column renamesType changes, large restructures
CI/CD integrationSimple phase gatesRequires sync validation step
Team skill requiredModerate SQL knowledgeDeep PostgreSQL internals

Expand/contract handles 80% of migration scenarios with far less operational overhead. Blue-green earns its complexity when you need atomic cutover for structural changes that can’t be broken into additive steps.

CI/CD pipeline integration

A migration safety gate in your pipeline prevents deploying application code before the schema is ready:

# .github/workflows/migrate.yml
jobs:
  migrate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Acquire advisory lock and migrate
        run: |
          psql "$DATABASE_URL" -c "SELECT pg_advisory_lock(12345);"
          flyway -url="$JDBC_URL" migrate
          psql "$DATABASE_URL" -c "SELECT pg_advisory_unlock(12345);"

      - name: Validate schema
        run: |
          psql "$DATABASE_URL" -c "
            SELECT column_name, data_type
            FROM information_schema.columns
            WHERE table_name = 'orders'
            AND column_name = 'customer_email';" \
          | grep -q 'customer_email' || exit 1

  deploy:
    needs: migrate
    runs-on: ubuntu-latest
    steps:
      - name: Deploy application
        run: kubectl rollout restart deployment/api

The deploy job depends on migrate, so schema changes land before code that references new columns. The validation step is a gate: if the expected column doesn’t exist, the pipeline fails before deployment.

What to actually do

Default to expand/contract for additive changes like column additions, index creation, and nullable-to-non-nullable conversions. It’s simpler, uses less storage, and works with standard migration tools.

Save blue-green (shadow schema) for structural rewrites: column type changes across large tables, primary key modifications, table splits. The 2x storage cost and operational complexity are only worth it when incremental steps aren’t feasible.

Use pg_advisory_lock and CI/CD gates no matter which pattern you pick. Concurrent migrations are a silent data corruption vector. Lock acquisition should be the first line of every migration job, and schema validation should gate every deployment. I’ve seen teams skip this and lose data. Don’t be that team.

TAGS: backend, architecture, devops, cicd, cloud


Share: Twitter LinkedIn