Designing Idempotent APIs for Mobile Clients: Retry Logic, Idempotency Keys, and the Patterns That Prevent Double Charges
TL;DR
Mobile networks fail. Requests retry. Without idempotency, retries mean double charges, duplicate orders, and angry users. This post covers Stripe-style idempotency keys with database-backed deduplication in Ktor, client-side retry with exponential backoff and jitter in Kotlin, and handling partial failures in multi-step mutations. The pattern itself is simple. Actually applying it everywhere is the hard part.
The problem the numbers reveal
After building production systems for mobile clients, I can tell you: network reliability is the variable most teams underestimate. Google’s Android Vitals data shows 5-10% of HTTP requests on mobile networks fail or timeout, rising above 20% in emerging markets. When your client retries a POST /payments and your server lacks idempotency guarantees, you get a double-charge incident.
Stripe handles millions of duplicate request detections daily through their idempotency layer. If they need this, you probably do too.

The idempotency key pattern
The client generates a unique key per logical operation. The server stores it alongside the response. On retry, the server returns the stored response instead of re-executing. That’s it.
Server-side: database-backed idempotency in Ktor
// Idempotency record stored in PostgreSQL
data class IdempotencyRecord(
val key: String,
val statusCode: Int,
val responseBody: String,
val createdAt: Instant,
val expiresAt: Instant = createdAt.plus(24, ChronoUnit.HOURS)
)
fun Route.createPayment(db: Database, paymentService: PaymentService) {
post("/payments") {
val idempotencyKey = call.request.header("Idempotency-Key")
?: return@post call.respond(HttpStatusCode.BadRequest, "Missing Idempotency-Key")
// Check for existing result
val existing = db.findIdempotencyRecord(idempotencyKey)
if (existing != null) {
call.respond(HttpStatusCode.fromValue(existing.statusCode), existing.responseBody)
return@post
}
// Acquire lock via INSERT with ON CONFLICT (prevents race conditions)
val locked = db.tryInsertIdempotencyLock(idempotencyKey)
if (!locked) {
call.respond(HttpStatusCode.Conflict, "Request in progress")
return@post
}
try {
val result = paymentService.processPayment(call.receive())
val response = Json.encodeToString(result)
db.completeIdempotencyRecord(idempotencyKey, 200, response)
call.respond(HttpStatusCode.OK, response)
} catch (e: Exception) {
db.deleteIdempotencyRecord(idempotencyKey) // Allow retry on failure
throw e
}
}
}
The part that matters most: use INSERT ... ON CONFLICT DO NOTHING for the lock. You get atomic deduplication without distributed locks.
Client-side: retry with exponential backoff and jitter
suspend fun <T> retryWithBackoff(
maxRetries: Int = 3,
initialDelayMs: Long = 500,
maxDelayMs: Long = 10_000,
block: suspend () -> T
): T {
var currentDelay = initialDelayMs
repeat(maxRetries) { attempt ->
try {
return block()
} catch (e: IOException) {
if (attempt == maxRetries - 1) throw e
val jitter = Random.nextLong(0, currentDelay / 2)
delay(currentDelay + jitter)
currentDelay = (currentDelay * 2).coerceAtMost(maxDelayMs)
}
}
throw IllegalStateException("Unreachable")
}
// Usage — same idempotency key across all retries
val idempotencyKey = UUID.randomUUID().toString()
val payment = retryWithBackoff {
api.createPayment(
body = paymentRequest,
headers = mapOf("Idempotency-Key" to idempotencyKey)
)
}
The jitter is not optional. Without it, clients that failed at the same time will retry at the same time. Thundering herd. I’ve seen it take down a staging environment on a Monday morning.
Retry strategy comparison
| Strategy | Collision risk | Implementation complexity | Best for |
|---|---|---|---|
| Fixed delay | High | Low | Internal services on stable networks |
| Exponential backoff | Medium | Medium | Server-to-server communication |
| Exponential + jitter | Low | Medium | Mobile clients (recommended) |
| Circuit breaker + backoff | Lowest | High | Critical payment flows |
Handling partial failures in multi-step mutations
Most teams get this wrong. A payment flow is rarely a single operation. It’s usually charge, then create order, then send confirmation. If step two fails, you need the idempotency key scoped to the entire operation, not individual steps.
suspend fun processOrder(key: String, request: OrderRequest): OrderResult {
return db.withIdempotency(key) {
val charge = paymentGateway.charge(request.amount) // Step 1
try {
val order = orderRepo.create(charge.id, request) // Step 2
emailService.sendConfirmation(order) // Step 3
OrderResult(order.id, charge.id)
} catch (e: Exception) {
paymentGateway.refund(charge.id) // Compensate step 1
throw e
}
}
}
This is a saga pattern in miniature. The idempotency key wraps the saga, not the individual steps. On retry, if the saga already completed, return the stored result. If it failed, the compensation already ran, and the retry re-executes the full saga safely.
Key expiration and storage
Stripe expires idempotency keys after 24 hours. Good enough. It bounds storage growth while covering any reasonable retry window. A created_at index with a periodic cleanup job handles this in PostgreSQL.
Expect roughly 50-100 bytes per record. At 1 million requests per day with a 24-hour TTL, that’s around 100 MB. Trivial for any production database.
What to actually do with this
Generate idempotency keys client-side, and scope them to the user’s intent. One key per button tap, not per HTTP request. Store the key before the first attempt so retries reuse it, even across app restarts.
Use database-level atomicity for deduplication. INSERT ON CONFLICT is your lock. Don’t build distributed locking infrastructure when your database already gives you exactly-once semantics at the row level.
Always use exponential backoff with jitter on mobile clients. Fixed delays cause thundering herds. The formula min(cap, base * 2^attempt) + random_jitter is well-proven. Implement it once in a shared retry utility and enforce it across your networking layer.
The numbers here are clear: idempotency is not optional. On mobile networks, it’s the difference between a payment system that works and one that generates support tickets. Build it in from the start.