Replacing Your Message Queue with PostgreSQL: LISTEN/NOTIFY, SKIP LOCKED Queues, and When Kafka Is Overkill for Your Startup
TL;DR
Most startups add Kafka or RabbitMQ before they need it. PostgreSQL’s LISTEN/NOTIFY handles pub/sub, FOR UPDATE SKIP LOCKED gives you a reliable worker queue, and pg_partman manages retention. All inside the database you already run. I’ve pushed this stack past 10K jobs/sec on modest hardware. Here’s the architecture, the benchmarks, and the failure modes that tell you when it’s time to graduate.
The problem: premature infrastructure
Paul Graham once wrote that the most dangerous thing you learn in school is to hack the test, to optimize for the metric rather than the goal. The same pattern shows up in backend architecture. Teams add Kafka on day one because “we might need it.”
What they don’t account for: a message broker brings operational baggage. ZooKeeper/KRaft clusters, consumer group rebalancing, schema registries, offset management. For a team of 3-8 engineers, that’s a tax on every deploy, every incident, and every on-call rotation.

The PostgreSQL queue architecture
1. Pub/sub with LISTEN/NOTIFY
PostgreSQL has a built-in pub/sub mechanism. No extensions required.
-- Publisher
NOTIFY order_events, '{"order_id": 42, "status": "paid"}';
-- Subscriber (any connected client)
LISTEN order_events;
Your application receives events asynchronously on the same connection pool you already maintain. In Node.js with pg, it’s five lines of code. Kotlin with Exposed or JDBC, similarly straightforward.
One catch: messages are fire-and-forget. If no listener is connected, the message is lost. That’s fine for cache invalidation, real-time UI updates, and notification fanout. It is not fine for financial transactions.
2. Reliable worker queues with SKIP LOCKED
For durable, at-least-once processing, use a jobs table with FOR UPDATE SKIP LOCKED:
CREATE TABLE job_queue (
id BIGSERIAL PRIMARY KEY,
payload JSONB NOT NULL,
status TEXT DEFAULT 'pending',
created_at TIMESTAMPTZ DEFAULT now(),
locked_at TIMESTAMPTZ
);
-- Worker claims a batch
UPDATE job_queue
SET status = 'processing', locked_at = now()
WHERE id IN (
SELECT id FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
FOR UPDATE SKIP LOCKED
LIMIT 10
)
RETURNING *;
Multiple workers compete safely with zero coordination. No advisory locks, no external broker. PostgreSQL handles the concurrency.
3. Retention with pg_partman
Partition your queue table by time, and pg_partman drops old partitions automatically. No manual DELETE queries hammering your table with vacuum pressure.
Benchmarks
I ran these on a single PostgreSQL 16 instance (4 vCPUs, 16GB RAM, NVMe SSD) using pgbench-style harnesses with 8 concurrent workers.
| Metric | PostgreSQL SKIP LOCKED | RabbitMQ | Kafka (single broker) |
|---|---|---|---|
| Throughput (jobs/sec) | ~12,000 | ~25,000 | ~100,000+ |
| P99 latency (claim + ack) | 8ms | 3ms | 12ms (batched) |
| Operational dependencies | 0 (it’s your DB) | Erlang runtime, mgmt plugin | JVM, KRaft/ZK, topic config |
| Setup time | 1 SQL migration | 2-4 hours | 4-8 hours |
| Monitoring | pg_stat_activity, existing dashboards | Separate dashboard | Separate dashboard + lag tooling |
12,000 jobs/sec covers most startups comfortably. Most process fewer than 500 jobs/sec in their first two years. You probably aren’t the exception.
When to stay on PostgreSQL
- Your total job throughput is under 5,000-10,000/sec
- You have fewer than 10 distinct queue/topic types
- Your team is under 15 engineers
- You want one fewer system to monitor at 3 AM
Failure modes that say “graduate now”
These are the signals I’ve seen in production that mean it’s time to add a dedicated broker:
| Signal | What you’ll see | Why it matters |
|---|---|---|
| WAL growth explosion | pg_wal directory exceeding 10GB+ steadily | High-throughput inserts/updates generate enormous write-ahead logs, which pressures replication lag and disk |
| Vacuum can’t keep up | n_dead_tup climbing on your queue table, autovacuum running constantly | Dead tuples from rapid claim/delete cycles bloat the table and tank query performance |
| Connection pool exhaustion | Workers holding connections waiting for SKIP LOCKED claims | Long-polling workers compete with your application’s OLTP queries for the same connection pool |
| Fan-out beyond 3-4 consumers | Multiple services needing the same event stream | PostgreSQL has no consumer group semantics. You’ll end up building ad hoc replication logic that a broker gives you for free |
When you see two or more of these concurrently, that’s your migration signal. Not before.
What to do with all this
Start with one job_queue table and SKIP LOCKED. You get durable, concurrent job processing with zero new infrastructure. Ship it in a single migration file and move on to features that actually matter.
Use LISTEN/NOTIFY for real-time, non-critical fanout. Cache invalidation, WebSocket pushes, dashboard refreshes. Pair it with your existing connection pool. Don’t add a Redis pub/sub layer you don’t need yet.
Instrument your queue table from day one. Monitor n_dead_tup, WAL size, and connection pool utilization. These three metrics will tell you exactly when PostgreSQL stops being enough, and you’ll migrate with data instead of anxiety.
The best infrastructure decision is the one you delay until you have evidence. PostgreSQL is already in your stack. Use it.