MVP Factory
ai startup development

Redis Streams: the event bus that delays Kafka by 2 years

KW
Krystian Wiewiór · · 6 min read

Meta description: Redis Streams with consumer groups, backpressure, and Ktor coroutines give startups a production event bus, delaying Kafka migration until you actually need it.

TL;DR: Redis Streams with consumer groups get you 90% of Kafka’s event bus semantics at 10% of the operational cost. With XREADGROUP, idempotency keys, and XPENDING-based backpressure in Ktor coroutine consumers, you can push 100K+ messages/second on a single node. That’s enough for most startups through Series B. Below: the architecture, working Kotlin + Ktor + Exposed code for order processing and webhook fan-out, and concrete numbers on where Redis Streams falls apart.

The problem: you don’t need Kafka yet

The number one infrastructure mistake I see startups make between seed and Series A is adopting Kafka. You’re running 500 events/second, you have two engineers, and now someone is debugging partition rebalancing at 2 AM. Redis Streams, added in Redis 5.0, offer append-only log semantics with consumer groups. Those are the exact primitives you need for an event bus, and you’re almost certainly already running Redis.

Consumer group semantics: the core mental model

Redis Streams maps closely to Kafka’s consumer group model:

ConceptKafkaRedis Streams
Log appendproducer.send()XADD
Consumer group readpoll()XREADGROUP
Offset commitcommitSync()XACK
Uncommitted offsetsConsumer lagPending Entry List (PEL)
RebalancingAutomatic partition reassignmentManual via XCLAIM / XAUTOCLAIM
RetentionTime/size-basedMAXLEN / MINID

The key difference: Redis Streams gives you per-message acknowledgment out of the box. No batching offset commits, no rebalancing storms. Each consumer calls XACK per message, and the Pending Entry List tracks everything that hasn’t been acknowledged.

Working implementation: Ktor + Exposed order processing

Here’s the consumer core, a Ktor coroutine worker that reads from a consumer group with exactly-once semantics via idempotency keys:

class StreamConsumer(
    private val redis: RedisCoroutinesCommands<String, String>,
    private val db: Database,
    private val stream: String,
    private val group: String,
    private val consumerId: String
) {
    suspend fun consume() = coroutineScope {
        // Ensure consumer group exists
        try { redis.xgroupCreate(
            XReadArgs.StreamOffset.from(stream, "0-0"), group,
            XGroupCreateArgs.Builder.mkstream()
        ) } catch (_: Exception) { /* group exists */ }

        while (isActive) {
            val messages = redis.xreadgroup(
                Consumer.from(group, consumerId),
                XReadArgs.Builder.count(50).block(Duration.ofSeconds(2)),
                XReadArgs.StreamOffset.lastConsumed(stream)
            )
            messages?.forEach { msg ->
                processWithIdempotency(msg)
            }
        }
    }

    private suspend fun processWithIdempotency(msg: StreamMessage<String, String>) {
        val idempotencyKey = msg.id  // Stream entry ID is globally unique
        transaction(db) {
            // Check-and-insert in one transaction
            val exists = ProcessedEvents
                .select { ProcessedEvents.eventId eq idempotencyKey }
                .count() > 0
            if (!exists) {
                ProcessedEvents.insert { it[eventId] = idempotencyKey }
                handleEvent(msg.body)
            }
        }
        redis.xack(stream, group, msg.id)  // ACK only after DB commit
    }
}

The ordering here matters: DB commit happens before XACK. If the process crashes between commit and ACK, the message stays in the PEL and gets redelivered, but the idempotency check prevents double processing. This is exactly-once effective processing.

Backpressure: XPENDING and claim-based redelivery

The Pending Entry List is your backpressure signal. Here’s a monitor coroutine that detects stuck consumers and reclaims their messages:

launch {
    while (isActive) {
        delay(30_000)
        // Claim messages idle for >60s from any consumer
        val stale = redis.xautoclaim(
            stream, XAutoClaimArgs.Builder
                .xautoclaim(group, consumerId, 60_000, "0-0")
        )
        stale.messages.forEach { msg ->
            val deliveryCount = redis.xpending(stream, group, msg.id, msg.id, 1)
                .firstOrNull()?.nacks ?: 0
            if (deliveryCount > 3) {
                // Dead-letter after 3 retries
                redis.xadd("${stream}:dlq", msg.body)
                redis.xack(stream, group, msg.id)
            }
        }
    }
}

XAUTOCLAIM (Redis 6.2+) atomically transfers ownership of idle messages. Combined with the nacks counter from XPENDING, you get retry-count-aware redelivery without external state.

Backpressure strategy summary

SignalActionImplementation
PEL size > thresholdPause new readsCheck XPENDING summary before XREADGROUP
Message idle time > 60sReclaim to healthy consumerXAUTOCLAIM
Delivery count > 3Route to dead-letter queueXADD to DLQ stream, then XACK
Consumer lag growingScale horizontallyAdd consumer instances to the group

Webhook fan-out and notification pipelines

For webhook fan-out, use multiple consumer groups on the same stream. Each group gets every message independently:

// One stream, three independent consumer groups
val groups = listOf("webhook-delivery", "analytics-pipeline", "notification-sender")
groups.forEach { group ->
    launch { StreamConsumer(redis, db, "orders", group, "$group-${hostId}").consume() }
}

This is Redis Streams’ equivalent of Kafka topic fan-out. Each group maintains its own PEL and offset. Zero coordination between them.

Where Redis Streams breaks down

Here’s what I’ve measured across production deployments:

MetricRedis Streams (single node)Kafka (3-broker cluster)
Throughput (sustained)~120K msg/s~500K+ msg/s
Throughput (burst)~200K msg/s~1M+ msg/s
Latency (p99)<2ms~5-15ms
Storage efficiencyLow (in-memory)High (disk-based)
RetentionHours to days (RAM-bound)Weeks to months
Consumer groupsDozensThousands
Operational complexityLow (it’s Redis)High (ZK/KRaft, brokers, schema registry)

Redis Streams wins on latency and simplicity. Kafka wins on throughput ceiling, retention, and consumer group scale. In my experience, the crossover point is when you need more than ~80K sustained msg/s, retention beyond 48 hours, or more than ~20 independent consumer groups. That’s when you start your Kafka migration.

If you’re processing orders, sending webhooks, and routing notifications, you probably won’t hit those limits until well past product-market fit.

What to actually do

Use Redis Streams with XREADGROUP + XACK as your event bus from day one. You already run Redis. Add consumer groups, wire up idempotency keys in your Ktor consumers, and you have a production event bus in an afternoon instead of a quarter-long Kafka rollout.

Build backpressure through XPENDING monitoring, not consumer-side rate limiting. The PEL is your source of truth for consumer health. Use XAUTOCLAIM for automatic failover and delivery-count tracking for dead-letter routing.

Don’t plan a Kafka migration timeline. Plan a Kafka migration trigger. Set concrete thresholds: sustained throughput above 80K msg/s, retention needs beyond 48 hours, consumer group count past 20. Migrate when you hit them. Not before.

TAGS: kotlin, backend, architecture, startup, api


Share: Twitter LinkedIn