PostgreSQL Partial Replication with Logical Decoding: Streaming Only What Your Microservices Need Without Change Data Capture Tooling
TL;DR
PostgreSQL 15 introduced row filters and column lists for publications, so you can replicate only the rows and columns each downstream microservice actually needs. For teams running fewer than ~10 microservices that consume domain events, this removes the operational weight of Debezium, Kafka Connect, and schema registries. You use infrastructure you already run. The catch: you must monitor replication slots aggressively, or WAL bloat will take down your primary.
The problem: CDC pipelines you don’t need yet
Most teams get this wrong about change data capture: they reach for Debezium + Kafka the moment two services need shared data. That stack does a lot, but it costs a lot to operate. ZooKeeper or KRaft clusters, Connect workers, schema registries, offset management, connector configs that silently break on DDL changes.
If your architecture is 3-8 services that each need a materialized read model from a shared PostgreSQL primary, built-in logical replication in PG15+ may be everything you need.
How publication filters work
PostgreSQL’s CREATE PUBLICATION now supports WHERE clauses and column lists directly:
-- Service: order-fulfillment
-- Only replicate orders that are paid and awaiting shipment
CREATE PUBLICATION pub_fulfillment
FOR TABLE orders (id, customer_id, status, shipping_address, created_at)
WHERE (status IN ('paid', 'processing'));
-- Service: analytics-ingest
-- Replicate all orders but only the columns needed for reporting
CREATE PUBLICATION pub_analytics
FOR TABLE orders (id, total_cents, currency, region, created_at),
TABLE line_items (id, order_id, sku, quantity, unit_price_cents);
Each downstream subscriber connects to its own publication and receives only the filtered subset. No application-level filtering, no wasted bandwidth shipping data the replica doesn’t want.
Logical decoding under the hood
Logical replication uses replication slots backed by output plugins (pgoutput is the default since PG10). The flow:
- WAL records are decoded by the output plugin into logical change events
- The publication filter is applied at decode time, so filtered rows never leave the primary
- Changes stream over a replication connection to the subscriber
The important detail: row filtering happens on the publisher side. The primary does the work, but it avoids serializing and transmitting data the subscriber would throw away.
PostgreSQL logical replication vs. Debezium/Kafka CDC
| Dimension | PG logical replication | Debezium + Kafka |
|---|---|---|
| Additional infrastructure | None | Kafka cluster, Connect workers, schema registry |
| Row-level filtering | Native (PG15+ WHERE) | SMT or consumer-side filtering |
| Column filtering | Native (PG15+ column lists) | SMT ReplaceField or downstream |
| Throughput ceiling | ~5K-15K TPS per slot (workload-dependent) | Horizontally scalable via partitions |
| Schema evolution | DDL requires manual ALTER SUBSCRIPTION REFRESH | Schema registry handles most cases |
| Fan-out to non-PG consumers | Not supported | Any Kafka consumer |
| Operational complexity | Low (pg_stat_replication, slot monitoring) | High (3+ distributed systems to operate) |
If your write throughput is under 10K TPS and all consumers are PostgreSQL databases, native replication wins on operational simplicity. It’s not close.
The WAL bloat problem, and how to prevent it
Every replication slot tells PostgreSQL: “Do not recycle WAL segments past this point.” If a subscriber goes down or falls behind, WAL accumulates on the primary. I’ve seen this take down production databases. It is the single most common way logical replication causes outages.
Set this on day one:
-- PG13+: Set a maximum WAL size retained per slot
ALTER SYSTEM SET max_slot_wal_keep_size = '10GB';
SELECT pg_reload_conf();
Monitor these metrics continuously:
-- Check replication lag and retained WAL per slot
SELECT
slot_name,
active,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)
) AS retained_wal,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS total_retained
FROM pg_replication_slots
WHERE slot_type = 'logical';
Set alerts on retained_wal exceeding a threshold. I recommend 20% of your available disk. When max_slot_wal_keep_size is breached, PostgreSQL invalidates the slot. The subscriber must be re-initialized, but your primary survives.
Slot management checklist
- Set
max_slot_wal_keep_sizeon the primary. No exceptions. - Name slots descriptively:
sub_fulfillment_v1, notsub1. - Drop inactive slots immediately when decommissioning a service:
SELECT pg_drop_replication_slot('slot_name'); - Monitor
pg_stat_replicationandpg_replication_slotsin your observability stack. - Test failover. If you use physical replication for HA, logical slots do not replicate to standbys until PG17’s
sync_replication_slots.
When to move to Debezium/Kafka
This approach has clear limits. Move to a dedicated CDC pipeline when:
- You need non-PostgreSQL consumers (Elasticsearch, Redis, data lakes)
- Write throughput exceeds what a single logical decoding slot handles comfortably
- You need exactly-once delivery semantics beyond what
pg_replication_originprovides - Your subscriber count grows past the point where per-slot CPU overhead on the primary becomes measurable
What to do with this
-
Start with
CREATE PUBLICATION ... WHEREon PG15+. Define per-service publications with row filters and column lists before evaluating any external CDC tooling. You probably already have the infrastructure. -
Treat
max_slot_wal_keep_sizeas non-negotiable. Set it on day one and alert on retained WAL size. An unmonitored replication slot is a disk-full outage waiting to happen. -
Design your publications around domain boundaries so that when you eventually need Debezium, the table-to-service mapping already exists. Your publication definitions become your CDC connector config blueprint.