MVP Factory
ai startup development

HikariCP config (WRONG)

KW
Krystian Wiewiór · · 5 min read

TL;DR

Most mobile backends die at the database connection layer, not the application layer. We survived a 10x traffic spike (5K to 50K concurrent mobile clients) by fixing three things: HikariCP maxLifetime racing against PostgreSQL’s idle_in_transaction_session_timeout, switching PgBouncer to transaction mode with prepared statement workarounds, and right-sizing our pool using a formula most teams apply incorrectly. Real numbers below.

The 10x morning

Our KMP backend was humming along serving ~5K concurrent users. Marketing pushed a campaign. Traffic spiked 10x overnight. The app didn’t crash from CPU or memory. It crashed because every thread was blocked waiting for a database connection that would never come.

The logs told the story:

HikariPool-1 - Connection is not available, request timed out after 30000ms.

I’ve seen this same failure mode kill more mobile backends than anything else. It’s always the connection pool.

Problem 1: The maxLifetime race condition

Most teams get maxLifetime wrong. They either leave it at the default (30 minutes) or set it to match PostgreSQL’s idle_in_transaction_session_timeout. Both are wrong.

The antipattern:

# HikariCP config (WRONG)
maximumPoolSize: 10
maxLifetime: 600000    # 10 minutes
connectionTimeout: 30000

# PostgreSQL config
idle_in_transaction_session_timeout = '10min'

When both values are identical, you create a race condition. HikariCP may hand out a connection milliseconds before PostgreSQL kills it server-side. The fix is simple: set maxLifetime at least 30 seconds shorter than any server-side timeout.

# HikariCP config (CORRECT)
maximumPoolSize: 10
maxLifetime: 570000    # 9.5 minutes
connectionTimeout: 5000 # fail fast
idleTimeout: 300000

That connectionTimeout change matters too. At 30 seconds, your mobile clients are staring at a spinner. At 5 seconds, you fail fast and can retry or degrade gracefully.

Problem 2: PgBouncer transaction mode kills prepared statements

We added PgBouncer to multiplex connections between the app layer and PostgreSQL. The topology:

┌─────────────┐
│ App Node 1  │──(10 conn)──┐
├─────────────┤             │    ┌───────────┐    ┌────────────┐
│ App Node 2  │──(10 conn)──┼───▶│ PgBouncer │───▶│ PostgreSQL │
├─────────────┤             │    └───────────┘    └────────────┘
│ App Node 3  │──(10 conn)──┘     30 server
└─────────────┘                   connections
  30 total client connections

Transaction mode is the obvious choice for mobile backends. It releases connections back to the pool after each transaction instead of per session. But there’s a catch that bit us hard.

PostgreSQL prepared statements are session-scoped. In transaction mode, your next query might land on a different server connection where your prepared statement doesn’t exist. The result: prepared statement "S_1" does not exist.

PgBouncer ModeConnection ReusePrepared StatementsThroughput (our bench)
SessionPer client sessionWork natively~2,800 req/s
TransactionPer transactionBroken by default~11,200 req/s
StatementPer statementBroken~12,100 req/s (unstable)

Benchmarked with k6, 3x 4-core / 16 GB app nodes, PostgreSQL 15 on 8-core / 32 GB, 60s sustained load at 50K virtual users.

Transaction mode gives us 4x throughput, but you must disable HikariCP’s prepared statement cache or use preparedStatementCacheQueries=0 on the JDBC URL:

HikariConfig().apply {
    jdbcUrl = "jdbc:postgresql://pgbouncer:6432/mydb" +
        "?prepareThreshold=0" +     // disable driver-level prep
        "&preparedStatementCacheQueries=0"
    maximumPoolSize = poolSize
    maxLifetime = 570_000
}

Problem 3: Pool sizing, the formula everyone misquotes

The well-known formula from the PostgreSQL wiki is:

connections = (core_count * 2) + effective_spindle_count

Most teams read “core count” as their 64-core database server. Wrong. On SSDs (spindle count = 0), and with a 4-core app server, the math is:

connections = (4 * 2) + 0 = 8-10 per app instance

Our before/after with 3 app instances behind a load balancer:

ConfigPool Size (per instance)Total DB Connectionsp99 LatencyErrors Under 50K Users
Before401202,400ms12% timeout
After103085ms0.01%

Fewer connections, way better performance. PostgreSQL thrashes with context switching beyond ~50 total connections. We went from 120 to 30 and our p99 dropped by 96%.

Connection leak detection

We added a Ktor interceptor that flags any connection held longer than 5 seconds:

install(createApplicationPlugin("LeakDetector") {
    onCallRespond { call, _ ->
        val held = call.attributes.getOrNull(connAcquiredAt)
        if (held != null && (System.nanoTime() - held) > 5_000_000_000L) {
            logger.warn("Possible connection leak: ${call.request.uri}")
        }
    }
})

This caught two N+1 queries and one missing connection.close() in a rarely-hit endpoint. Exactly the kind of bugs that sleep until traffic spikes wake them up.

What actually mattered

Set HikariCP maxLifetime 30+ seconds below your shortest server-side timeout. The race condition between HikariCP and PostgreSQL/PgBouncer is the number one cause of pool exhaustion I’ve seen in production mobile backends.

Use PgBouncer transaction mode, but disable prepared statement caching at the JDBC driver level. You get 4x connection throughput without the phantom prepared statement errors. If you’re running high-concurrency mobile workloads, this isn’t optional.

Apply the pool sizing formula per app instance, not per database server, and assume zero spindles on SSDs. Most teams need 8-10 connections per instance, not 40. Fewer connections means less PostgreSQL context switching and significantly lower tail latency.

The connection pool is the narrowest pipe in your mobile backend. We spent two weeks tuning application code that turned out not to matter. The pool config took an afternoon and fixed everything. Start there.


Share: Twitter LinkedIn