Backpressure-Aware SSE Reconnection in Mobile Clients: EventSource Gaps, Exponential Backoff with Jitter, and the Kotlin Flow Architecture That Prevents Message Loss During Network Transitions
TL;DR
Standard EventSource implementations silently drop messages during mobile network transitions. Last-Event-ID alone doesn’t guarantee delivery because servers can evict event history before your client reconnects. What you actually need: a Kotlin Flow-based SSE consumer backed by a local Write-Ahead Log (WAL), exponential backoff with jitter, and explicit backpressure signaling when the UI layer falls behind the event stream.
The problem most teams ignore
I’ve built several production systems that rely on Server-Sent Events. Teams spend 90% of their effort on the server push architecture and roughly 0% thinking about what happens on the client when a user walks from wifi into an elevator.
What actually happens during a cellular-to-wifi handoff:
- The TCP connection drops silently (no FIN, no RST)
- The OS detects the new network after 2-15 seconds
- The
EventSourceimplementation reconnects - Events emitted during the gap are gone
The W3C EventSource spec defines Last-Event-ID as the recovery mechanism. The client sends the last received ID on reconnection, and the server replays from that point. In theory, this works. In practice, most server implementations use bounded in-memory buffers for event history.
| Server Implementation | Default Buffer | Eviction Policy |
|---|---|---|
Node.js sse-channel | 500 events | FIFO ring buffer |
Go r3labs/sse | 1000 events | Time-based (5 min) |
| Spring SseEmitter | None | No replay support |
Nginx nchan | Configurable | Memory + time |
If your client disconnects for longer than the server retains history, Last-Event-ID returns nothing. No error. No indication of loss. Just silence.
Exponential backoff with jitter: getting reconnection right
The SSE spec suggests a server-sent retry field to control reconnection timing. Most implementations default to a fixed 3-second retry. On mobile, this is wrong for two reasons: it creates thundering herd problems when thousands of clients reconnect after a regional outage, and it wastes battery during extended dead zones.
The mistake I see constantly: teams implement backoff without jitter, which just shifts the thundering herd to a later time slot.
fun reconnectDelay(attempt: Int, baseMs: Long = 1000L, maxMs: Long = 30_000L): Long {
val exponential = baseMs * 2.0.pow(attempt.coerceAtMost(5)).toLong()
val capped = exponential.coerceAtMost(maxMs)
val jitter = (capped * Random.nextDouble(0.5, 1.0)).toLong()
return jitter
}
The numbers are stark. With 10,000 clients reconnecting, fixed 3-second retry concentrates all connections in a single 100ms window. Full jitter spreads them across 15 seconds, a 150x reduction in peak load.
The Kotlin Flow architecture
The core architecture uses three layers connected via Kotlin Flows with explicit backpressure:
class ResilientSseConsumer(
private val db: WalDatabase,
private val connectivity: ConnectivityMonitor
) {
fun events(): Flow<SseEvent> = channelFlow {
connectivity.networkState.collectLatest { state ->
if (state.isConnected) {
val lastId = db.walDao().lastEventId()
sseConnect(lastId)
.onEach { event ->
db.walDao().insert(event.toWalEntry())
}
.buffer(capacity = 64, onBufferOverflow = BufferOverflow.SUSPEND)
.collect { send(it) }
}
}
}.flowOn(Dispatchers.IO)
}
WAL-backed message buffer
Every received event gets written to a Room database WAL before delivery to the UI. This survives process death. On reconnection, the client reads the last persisted event ID from SQLite, not from memory.
Network transition handling
Instead of relying on EventSource’s built-in reconnection, the architecture observes ConnectivityManager callbacks via collectLatest. When the network changes, the current connection is cancelled and a fresh one is established with the correct Last-Event-ID. collectLatest is the key operator here because it ensures only one active SSE connection exists at any time.
Backpressure via buffer(SUSPEND)
When the UI can’t consume events fast enough (common during rapid state updates), .buffer(capacity = 64, onBufferOverflow = SUSPEND) applies backpressure upstream. The SSE read loop pauses, TCP flow control kicks in, and the server naturally slows delivery. No dropped messages. No unbounded memory growth.
| Strategy | Memory behavior | Message loss | Process death recovery |
|---|---|---|---|
| Raw EventSource | Unbounded | Yes, on reconnect | None |
| EventSource + Last-Event-ID | Unbounded | Server buffer dependent | None |
| Flow + WAL + Backpressure | Bounded (64 events) | No | Full recovery |
Handling the gap between server and client
Even with WAL persistence, there’s a window where the server may have evicted events that the client hasn’t yet received. The defense is a sequence number embedded in each event. On reconnection, the client compares the first received sequence number against its last persisted one. If there’s a gap, it triggers a full state sync via a REST fallback endpoint.
if (firstEvent.sequence - lastPersistedSequence > 1) {
val fullState = api.getFullState()
db.walDao().replaceAll(fullState)
}
This hybrid approach, SSE for real-time and REST for gap recovery, is the only pattern I’ve seen work reliably in production across flaky mobile networks.
What to do with all this
-
Don’t trust
Last-Event-IDalone. Persist event IDs in a local WAL and implement sequence gap detection with a REST fallback for full state recovery. -
Use
collectLatestwithConnectivityManagerfor network transitions. Don’t rely onEventSourcereconnection. It’s unaware of Android network lifecycle and will maintain zombie connections during handoffs. -
Apply explicit backpressure with
buffer(SUSPEND). Unbounded event buffering on mobile leads to OOM crashes under burst traffic. Let Kotlin Flow’s structured concurrency propagate backpressure through TCP flow control to the server.