MVP Factory
ai startup development

Backpressure-Aware SSE Reconnection in Mobile Clients: EventSource Gaps, Exponential Backoff with Jitter, and the Kotlin Flow Architecture That Prevents Message Loss During Network Transitions

KW
Krystian Wiewiór · · 5 min read

TL;DR

Standard EventSource implementations silently drop messages during mobile network transitions. Last-Event-ID alone doesn’t guarantee delivery because servers can evict event history before your client reconnects. What you actually need: a Kotlin Flow-based SSE consumer backed by a local Write-Ahead Log (WAL), exponential backoff with jitter, and explicit backpressure signaling when the UI layer falls behind the event stream.


The problem most teams ignore

I’ve built several production systems that rely on Server-Sent Events. Teams spend 90% of their effort on the server push architecture and roughly 0% thinking about what happens on the client when a user walks from wifi into an elevator.

What actually happens during a cellular-to-wifi handoff:

  1. The TCP connection drops silently (no FIN, no RST)
  2. The OS detects the new network after 2-15 seconds
  3. The EventSource implementation reconnects
  4. Events emitted during the gap are gone

The W3C EventSource spec defines Last-Event-ID as the recovery mechanism. The client sends the last received ID on reconnection, and the server replays from that point. In theory, this works. In practice, most server implementations use bounded in-memory buffers for event history.

Server ImplementationDefault BufferEviction Policy
Node.js sse-channel500 eventsFIFO ring buffer
Go r3labs/sse1000 eventsTime-based (5 min)
Spring SseEmitterNoneNo replay support
Nginx nchanConfigurableMemory + time

If your client disconnects for longer than the server retains history, Last-Event-ID returns nothing. No error. No indication of loss. Just silence.

Exponential backoff with jitter: getting reconnection right

The SSE spec suggests a server-sent retry field to control reconnection timing. Most implementations default to a fixed 3-second retry. On mobile, this is wrong for two reasons: it creates thundering herd problems when thousands of clients reconnect after a regional outage, and it wastes battery during extended dead zones.

The mistake I see constantly: teams implement backoff without jitter, which just shifts the thundering herd to a later time slot.

fun reconnectDelay(attempt: Int, baseMs: Long = 1000L, maxMs: Long = 30_000L): Long {
    val exponential = baseMs * 2.0.pow(attempt.coerceAtMost(5)).toLong()
    val capped = exponential.coerceAtMost(maxMs)
    val jitter = (capped * Random.nextDouble(0.5, 1.0)).toLong()
    return jitter
}

The numbers are stark. With 10,000 clients reconnecting, fixed 3-second retry concentrates all connections in a single 100ms window. Full jitter spreads them across 15 seconds, a 150x reduction in peak load.

The Kotlin Flow architecture

The core architecture uses three layers connected via Kotlin Flows with explicit backpressure:

class ResilientSseConsumer(
    private val db: WalDatabase,
    private val connectivity: ConnectivityMonitor
) {
    fun events(): Flow<SseEvent> = channelFlow {
        connectivity.networkState.collectLatest { state ->
            if (state.isConnected) {
                val lastId = db.walDao().lastEventId()
                sseConnect(lastId)
                    .onEach { event ->
                        db.walDao().insert(event.toWalEntry())
                    }
                    .buffer(capacity = 64, onBufferOverflow = BufferOverflow.SUSPEND)
                    .collect { send(it) }
            }
        }
    }.flowOn(Dispatchers.IO)
}

WAL-backed message buffer

Every received event gets written to a Room database WAL before delivery to the UI. This survives process death. On reconnection, the client reads the last persisted event ID from SQLite, not from memory.

Network transition handling

Instead of relying on EventSource’s built-in reconnection, the architecture observes ConnectivityManager callbacks via collectLatest. When the network changes, the current connection is cancelled and a fresh one is established with the correct Last-Event-ID. collectLatest is the key operator here because it ensures only one active SSE connection exists at any time.

Backpressure via buffer(SUSPEND)

When the UI can’t consume events fast enough (common during rapid state updates), .buffer(capacity = 64, onBufferOverflow = SUSPEND) applies backpressure upstream. The SSE read loop pauses, TCP flow control kicks in, and the server naturally slows delivery. No dropped messages. No unbounded memory growth.

StrategyMemory behaviorMessage lossProcess death recovery
Raw EventSourceUnboundedYes, on reconnectNone
EventSource + Last-Event-IDUnboundedServer buffer dependentNone
Flow + WAL + BackpressureBounded (64 events)NoFull recovery

Handling the gap between server and client

Even with WAL persistence, there’s a window where the server may have evicted events that the client hasn’t yet received. The defense is a sequence number embedded in each event. On reconnection, the client compares the first received sequence number against its last persisted one. If there’s a gap, it triggers a full state sync via a REST fallback endpoint.

if (firstEvent.sequence - lastPersistedSequence > 1) {
    val fullState = api.getFullState()
    db.walDao().replaceAll(fullState)
}

This hybrid approach, SSE for real-time and REST for gap recovery, is the only pattern I’ve seen work reliably in production across flaky mobile networks.

What to do with all this

  1. Don’t trust Last-Event-ID alone. Persist event IDs in a local WAL and implement sequence gap detection with a REST fallback for full state recovery.

  2. Use collectLatest with ConnectivityManager for network transitions. Don’t rely on EventSource reconnection. It’s unaware of Android network lifecycle and will maintain zombie connections during handoffs.

  3. Apply explicit backpressure with buffer(SUSPEND). Unbounded event buffering on mobile leads to OOM crashes under burst traffic. Let Kotlin Flow’s structured concurrency propagate backpressure through TCP flow control to the server.


Share: Twitter LinkedIn