PostgreSQL Advisory Locks for Distributed Job Scheduling: Skip Locked, Lock Timeout Tuning, and the Coordination Pattern That Replaces Your Message Queue

The problem: coordination without new infrastructure

Every startup hits the same inflection point: you need background jobs across multiple API nodes, and someone suggests adding Redis + BullMQ. Suddenly your infrastructure doubles. You need monitoring for a new stateful service. Your deployment complexity jumps.

Most teams miss this: PostgreSQL already has the coordination primitives you need.

Advisory locks: the primitive nobody uses

PostgreSQL offers two flavors of advisory locks:

Feature	`pg_advisory_lock` (Session)	`pg_advisory_xact_lock` (Transaction)
Release	Explicit or session end	Transaction commit/rollback
Deadlock detection	Yes, via `deadlock_timeout`	Yes, via `deadlock_timeout`
Best for	Long-running jobs	Short transactional work
Risk	Leak if not released	Holds connections longer

For job scheduling, transaction-level locks are almost always the right choice. They self-clean on commit or rollback, which eliminates the most common advisory lock bug: leaked session-level locks from crashed workers.

The schema and polling pattern

This is the schema I’ve run in production across a 3-node cluster:

CREATE TABLE job_queue (
    id          BIGSERIAL PRIMARY KEY,
    job_type    TEXT NOT NULL,
    payload     JSONB NOT NULL,
    status      TEXT NOT NULL DEFAULT 'pending',
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    locked_at   TIMESTAMPTZ,
    completed_at TIMESTAMPTZ,
    attempts    INT NOT NULL DEFAULT 0,
    max_attempts INT NOT NULL DEFAULT 3
);

CREATE INDEX idx_job_queue_fetchable
    ON job_queue (created_at)
    WHERE status = 'pending';

The fetch query is where SKIP LOCKED does the real work:

BEGIN;

UPDATE job_queue
SET status = 'processing', locked_at = now(), attempts = attempts + 1
WHERE id = (
    SELECT id FROM job_queue
    WHERE status = 'pending'
      AND attempts < max_attempts
    ORDER BY created_at
    FOR UPDATE SKIP LOCKED
    LIMIT 1
)
RETURNING *;

-- Execute job logic here

UPDATE job_queue SET status = 'completed', completed_at = now() WHERE id = $1;
COMMIT;

FOR UPDATE SKIP LOCKED is the key. Each worker grabs the next available row without blocking other workers. No advisory lock needed for the basic case.

When to layer in advisory locks

Advisory locks become worth it when you need job-type-level coordination, like ensuring only one worker processes a given job type at a time, or rate-limiting concurrent processing:

-- Only one worker processes 'invoice_generation' at a time
SELECT pg_try_advisory_xact_lock(hashtext('invoice_generation'));

This pattern lets you build per-tenant or per-resource throttling without additional infrastructure.

Lock timeout tuning

The default lock_timeout of 0 (infinite wait) is dangerous for job schedulers. Set it explicitly:

SET lock_timeout = '5s';

I’ve found 2-5 seconds to be the sweet spot for polling workers. Go shorter and you’ll burn cycles on retries. Go longer and you risk worker starvation during contention spikes.

Monitoring lock contention

You can observe advisory lock contention directly via pg_stat_activity:

SELECT pid, wait_event_type, wait_event, query, state
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
  AND wait_event LIKE '%advisory%';

Set up alerts when advisory lock waits consistently exceed your lock_timeout threshold. That’s your signal to either add workers or revisit lock granularity.

Benchmarks: Postgres vs. Redis+BullMQ

I benchmarked both setups on a 3-node API cluster (4 vCPU, 8GB RAM each) with a single PostgreSQL 15 instance (8 vCPU, 32GB RAM) against a comparable Redis 7 instance. Workload: mixed job types, payload sizes 1-10KB.

Metric	Postgres SKIP LOCKED	Redis + BullMQ
Throughput (jobs/min)	~850	~920
p50 latency (ms)	12	8
p99 latency (ms)	85	34
Infra components	1 (Postgres)	2 (Postgres + Redis)
Failure recovery	Automatic (tx rollback)	Requires stalled job detection
Operational complexity	Low	Medium

At sub-1000 jobs/minute, Postgres is within 8% on throughput. The p99 gap is real: 85ms vs 34ms. But for most background work (email sends, report generation, webhook delivery), nobody notices that difference.

Deadlock detection pitfalls

One production lesson worth calling out: never combine advisory locks with row-level locks in inconsistent orders. If Worker A holds an advisory lock and waits for a row lock while Worker B holds that row lock and waits for the same advisory lock, PostgreSQL’s deadlock detector will resolve it, but at the cost of a rolled-back transaction and a retry.

The fix is simple. Always acquire advisory locks before row locks, and always in a consistent, deterministic order.

The break you need to take

Speaking of long-running work: if you’re the engineer polling job queues and staring at pg_stat_activity dashboards for hours, your body is doing its own form of lock contention. I use HealthyDesk to force break reminders during deep debugging sessions, which honestly helps me spot deadlock patterns faster when I come back.

What to actually do with this

Start with SKIP LOCKED before reaching for advisory locks. The simple FOR UPDATE SKIP LOCKED pattern handles 80% of job scheduling needs. Layer advisory locks on top only when you need job-type-level coordination or per-resource throttling.

Use transaction-level advisory locks (pg_advisory_xact_lock) and set an explicit lock_timeout. I’d start with 5 seconds and tune from there. This eliminates leaked locks and prevents worker starvation.

Defer Redis until you’re past ~1,000 jobs/minute or need sub-50ms p99. Below that threshold, one fewer infrastructure component to babysit is worth the latency tradeoff. And when you do outgrow Postgres, the migration is straightforward if your job interface is clean.

PostgreSQL Advisory Locks for Distributed Job Scheduling: Skip Locked, Lock Timeout Tuning, and the Coordination Pattern That Replaces Your Message Queue

The problem: coordination without new infrastructure

Advisory locks: the primitive nobody uses

The schema and polling pattern

When to layer in advisory locks

Lock timeout tuning

Monitoring lock contention

Benchmarks: Postgres vs. Redis+BullMQ

Deadlock detection pitfalls

The break you need to take

What to actually do with this

TAGS: postgresql, backend, architecture, api, startup

Related Posts

PgBouncer transaction mode for 50k mobile users

Android LLM speed: KV cache persistence cuts latency 60%

gRPC-Web on mobile without a proxy: Connect Protocol