Replacing Your Message Queue with PostgreSQL: SKIP LOCKED Queues, LISTEN/NOTIFY Pub/Sub, and the Transactional Outbox Pattern That Eliminates Dual-Write Bugs Without Adding Infrastructure

TL;DR

Most startups don’t need a dedicated message queue. PostgreSQL’s FOR UPDATE SKIP LOCKED gives you a solid job queue, LISTEN/NOTIFY handles real-time fan-out, and the transactional outbox pattern eliminates dual-write bugs entirely. All without adding Redis or RabbitMQ to your stack. This holds comfortably to ~10,000 jobs/minute before you need something bigger. Every service you remove is a service you don’t debug at 2 AM.

The infrastructure creep problem

There’s a running debate right now about whether fewer lines of code and fewer moving parts mean you didn’t build anything real. One team recently shared that 14,230 lines runs their entire GTM operation. The lesson isn’t the number. It’s what you don’t need.

The single biggest source of operational pain for early-stage teams isn’t the code. It’s the infrastructure graph. Every box on your architecture diagram is a thing that fails, needs monitoring, and requires someone who understands its failure modes. If PostgreSQL is already running your application state, making it also run your job queue and event bus is just good engineering.

Pattern 1: SKIP LOCKED as a job queue

The FOR UPDATE SKIP LOCKED clause, available since PostgreSQL 9.5, turns any table into a concurrent-safe work queue.

-- Dequeue the next available job (multiple workers, zero contention)
WITH next_job AS (
  SELECT id, payload
  FROM job_queue
  WHERE status = 'pending'
  ORDER BY created_at
  LIMIT 1
  FOR UPDATE SKIP LOCKED
)
UPDATE job_queue SET status = 'processing'
FROM next_job
WHERE job_queue.id = next_job.id
RETURNING job_queue.*;

Workers grab the next unlocked row atomically. No polling races, no double-processing.

Benchmarks: PostgreSQL queue vs. Redis/RabbitMQ

Metric	PG SKIP LOCKED	Redis (rpoplpush)	RabbitMQ
Throughput (jobs/min)	~10,000-12,000	~80,000+	~40,000+
Latency (p99)	5-15 ms	<1 ms	1-3 ms
Exactly-once delivery	Native (transactions)	Requires Lua scripts	Requires publisher confirms + dedup
Additional infra	None	Redis instance + monitoring	Broker cluster + monitoring
Failure mode complexity	One system	Two systems	Two systems

PostgreSQL won’t win a throughput race. It doesn’t need to. 10K jobs/minute covers the vast majority of startups. You hit that ceiling only when you’re processing ~150 jobs/second sustained, and at that point you have the revenue to justify dedicated infrastructure.

Pattern 2: LISTEN/NOTIFY for real-time fan-out

PostgreSQL’s LISTEN/NOTIFY gives you lightweight pub/sub without polling.

-- Publisher (inside your existing transaction)
NOTIFY order_events, '{"order_id": 42, "status": "paid"}';

-- Subscriber (any connected client)
LISTEN order_events;

Works well for cache invalidation, WebSocket push, and internal microservice signaling.

The PgBouncer gotcha

Most teams get this wrong: LISTEN/NOTIFY does not work through PgBouncer in transaction pooling mode. PgBouncer reassigns connections between transactions, so your LISTEN subscription gets silently dropped.

You have three options:

Run a dedicated direct connection for NOTIFY listeners, bypassing PgBouncer entirely.
Set up session pooling mode on a separate PgBouncer instance for subscriber connections.
Fall back to polling a notifications table with SKIP LOCKED (which kind of defeats the purpose).

Go with option 1. One dedicated connection per subscriber service costs almost nothing compared to adding an entire Redis instance.

Pattern 3: The transactional outbox

The dual-write problem is the silent data loss bug hiding in most startup codebases. You save an order to your database, then publish an event to your queue. If the publish fails after the commit, your event is lost. If the publish succeeds but the transaction rolls back, you have a phantom event. Both happen more often than people think.

The transactional outbox kills this:

BEGIN;
  INSERT INTO orders (id, total) VALUES (42, 99.00);
  INSERT INTO outbox (aggregate_id, event_type, payload)
    VALUES (42, 'order.created', '{"id":42,"total":99.00}');
COMMIT;

A separate poller (using SKIP LOCKED) reads the outbox and forwards events to downstream consumers. The event write and the business write live in the same transaction. They either both happen or neither does. No distributed transactions, no Saga compensations, no eventual-inconsistency surprises.

This is the same foundation behind Debezium’s CDC approach, and Microsoft recommends it in their .NET microservices architecture guide.

When you actually need dedicated infrastructure

Be honest about the ceiling. Reach for RabbitMQ or Kafka when:

Throughput exceeds ~10K jobs/min sustained and vertical scaling is maxed
You need multi-datacenter replication of your event stream
Consumer fan-out exceeds 10+ independent subscribers on a single topic
Message retention and replay is a core product requirement (event sourcing at scale)

Until then, you’re adding operational complexity for theoretical scale. I’ve watched teams spend weeks setting up RabbitMQ clusters they didn’t need for another two years.

What to do with this

Start with SKIP LOCKED for all background job processing. Add a job_queue table today and drop your Redis dependency. Migration takes an afternoon; the operational simplification lasts forever.

Use the transactional outbox from day one. Dual-write bugs are silent and cumulative. By the time you notice lost events, you’ve already shipped inconsistent data to customers. The outbox costs one extra INSERT per transaction, which is nothing compared to debugging ghost events at midnight.

Draw your migration trigger line now, before you need it. Pick a number: “When we sustain X jobs/second for Y hours, we move to dedicated infrastructure.” Without that threshold written down somewhere, teams either migrate too early out of anxiety or too late after a production fire. 150 jobs/second sustained is a reasonable starting line for most startups.

Replacing Your Message Queue with PostgreSQL: SKIP LOCKED Queues, LISTEN/NOTIFY Pub/Sub, and the Transactional Outbox Pattern That Eliminates Dual-Write Bugs Without Adding Infrastructure

TL;DR

The infrastructure creep problem

Pattern 1: SKIP LOCKED as a job queue

Benchmarks: PostgreSQL queue vs. Redis/RabbitMQ

Pattern 2: LISTEN/NOTIFY for real-time fan-out

The PgBouncer gotcha

Pattern 3: The transactional outbox

When you actually need dedicated infrastructure

What to do with this

Related Posts

Distributed tracing on a budget with OpenTelemetry and Grafana

Replacing Your Message Queue with PostgreSQL: SKIP LOCKED Queues, LISTEN/NOTIFY Pub/Sub, and the Transactional Outbox Pattern That Eliminates Dual-Write Bugs Without Adding Infrastructure

nvidia-device-plugin ConfigMap