Redis Streams: the event bus that delays Kafka by 2 years
Meta description: Redis Streams with consumer groups, backpressure, and Ktor coroutines give startups a production event bus, delaying Kafka migration until you actually need it.
TL;DR: Redis Streams with consumer groups get you 90% of Kafka’s event bus semantics at 10% of the operational cost. With XREADGROUP, idempotency keys, and XPENDING-based backpressure in Ktor coroutine consumers, you can push 100K+ messages/second on a single node. That’s enough for most startups through Series B. Below: the architecture, working Kotlin + Ktor + Exposed code for order processing and webhook fan-out, and concrete numbers on where Redis Streams falls apart.
The problem: you don’t need Kafka yet
The number one infrastructure mistake I see startups make between seed and Series A is adopting Kafka. You’re running 500 events/second, you have two engineers, and now someone is debugging partition rebalancing at 2 AM. Redis Streams, added in Redis 5.0, offer append-only log semantics with consumer groups. Those are the exact primitives you need for an event bus, and you’re almost certainly already running Redis.
Consumer group semantics: the core mental model
Redis Streams maps closely to Kafka’s consumer group model:
| Concept | Kafka | Redis Streams |
|---|---|---|
| Log append | producer.send() | XADD |
| Consumer group read | poll() | XREADGROUP |
| Offset commit | commitSync() | XACK |
| Uncommitted offsets | Consumer lag | Pending Entry List (PEL) |
| Rebalancing | Automatic partition reassignment | Manual via XCLAIM / XAUTOCLAIM |
| Retention | Time/size-based | MAXLEN / MINID |
The key difference: Redis Streams gives you per-message acknowledgment out of the box. No batching offset commits, no rebalancing storms. Each consumer calls XACK per message, and the Pending Entry List tracks everything that hasn’t been acknowledged.
Working implementation: Ktor + Exposed order processing
Here’s the consumer core, a Ktor coroutine worker that reads from a consumer group with exactly-once semantics via idempotency keys:
class StreamConsumer(
private val redis: RedisCoroutinesCommands<String, String>,
private val db: Database,
private val stream: String,
private val group: String,
private val consumerId: String
) {
suspend fun consume() = coroutineScope {
// Ensure consumer group exists
try { redis.xgroupCreate(
XReadArgs.StreamOffset.from(stream, "0-0"), group,
XGroupCreateArgs.Builder.mkstream()
) } catch (_: Exception) { /* group exists */ }
while (isActive) {
val messages = redis.xreadgroup(
Consumer.from(group, consumerId),
XReadArgs.Builder.count(50).block(Duration.ofSeconds(2)),
XReadArgs.StreamOffset.lastConsumed(stream)
)
messages?.forEach { msg ->
processWithIdempotency(msg)
}
}
}
private suspend fun processWithIdempotency(msg: StreamMessage<String, String>) {
val idempotencyKey = msg.id // Stream entry ID is globally unique
transaction(db) {
// Check-and-insert in one transaction
val exists = ProcessedEvents
.select { ProcessedEvents.eventId eq idempotencyKey }
.count() > 0
if (!exists) {
ProcessedEvents.insert { it[eventId] = idempotencyKey }
handleEvent(msg.body)
}
}
redis.xack(stream, group, msg.id) // ACK only after DB commit
}
}
The ordering here matters: DB commit happens before XACK. If the process crashes between commit and ACK, the message stays in the PEL and gets redelivered, but the idempotency check prevents double processing. This is exactly-once effective processing.
Backpressure: XPENDING and claim-based redelivery
The Pending Entry List is your backpressure signal. Here’s a monitor coroutine that detects stuck consumers and reclaims their messages:
launch {
while (isActive) {
delay(30_000)
// Claim messages idle for >60s from any consumer
val stale = redis.xautoclaim(
stream, XAutoClaimArgs.Builder
.xautoclaim(group, consumerId, 60_000, "0-0")
)
stale.messages.forEach { msg ->
val deliveryCount = redis.xpending(stream, group, msg.id, msg.id, 1)
.firstOrNull()?.nacks ?: 0
if (deliveryCount > 3) {
// Dead-letter after 3 retries
redis.xadd("${stream}:dlq", msg.body)
redis.xack(stream, group, msg.id)
}
}
}
}
XAUTOCLAIM (Redis 6.2+) atomically transfers ownership of idle messages. Combined with the nacks counter from XPENDING, you get retry-count-aware redelivery without external state.
Backpressure strategy summary
| Signal | Action | Implementation |
|---|---|---|
| PEL size > threshold | Pause new reads | Check XPENDING summary before XREADGROUP |
| Message idle time > 60s | Reclaim to healthy consumer | XAUTOCLAIM |
| Delivery count > 3 | Route to dead-letter queue | XADD to DLQ stream, then XACK |
| Consumer lag growing | Scale horizontally | Add consumer instances to the group |
Webhook fan-out and notification pipelines
For webhook fan-out, use multiple consumer groups on the same stream. Each group gets every message independently:
// One stream, three independent consumer groups
val groups = listOf("webhook-delivery", "analytics-pipeline", "notification-sender")
groups.forEach { group ->
launch { StreamConsumer(redis, db, "orders", group, "$group-${hostId}").consume() }
}
This is Redis Streams’ equivalent of Kafka topic fan-out. Each group maintains its own PEL and offset. Zero coordination between them.
Where Redis Streams breaks down
Here’s what I’ve measured across production deployments:
| Metric | Redis Streams (single node) | Kafka (3-broker cluster) |
|---|---|---|
| Throughput (sustained) | ~120K msg/s | ~500K+ msg/s |
| Throughput (burst) | ~200K msg/s | ~1M+ msg/s |
| Latency (p99) | <2ms | ~5-15ms |
| Storage efficiency | Low (in-memory) | High (disk-based) |
| Retention | Hours to days (RAM-bound) | Weeks to months |
| Consumer groups | Dozens | Thousands |
| Operational complexity | Low (it’s Redis) | High (ZK/KRaft, brokers, schema registry) |
Redis Streams wins on latency and simplicity. Kafka wins on throughput ceiling, retention, and consumer group scale. In my experience, the crossover point is when you need more than ~80K sustained msg/s, retention beyond 48 hours, or more than ~20 independent consumer groups. That’s when you start your Kafka migration.
If you’re processing orders, sending webhooks, and routing notifications, you probably won’t hit those limits until well past product-market fit.
What to actually do
Use Redis Streams with XREADGROUP + XACK as your event bus from day one. You already run Redis. Add consumer groups, wire up idempotency keys in your Ktor consumers, and you have a production event bus in an afternoon instead of a quarter-long Kafka rollout.
Build backpressure through XPENDING monitoring, not consumer-side rate limiting. The PEL is your source of truth for consumer health. Use XAUTOCLAIM for automatic failover and delivery-count tracking for dead-letter routing.
Don’t plan a Kafka migration timeline. Plan a Kafka migration trigger. Set concrete thresholds: sustained throughput above 80K msg/s, retention needs beyond 48 hours, consumer group count past 20. Migrate when you hit them. Not before.
TAGS: kotlin, backend, architecture, startup, api