MVP Factory
Connection pool tuning: HikariCP defaults kill mobile backends
ai startup development

Connection pool tuning: HikariCP defaults kill mobile backends

KW
Krystian Wiewiór · · 6 min read

Meta description: Why HikariCP’s default pool size formula fails under bursty mobile traffic, and how to instrument, detect, and fix connection starvation before it cascades.

TL;DR: The classic connections = (CPU cores * 2) + 1 formula assumes steady-state throughput. Mobile backends don’t have steady state. They have burst patterns driven by push notifications, app opens at commute hours, and retry storms. I’ve seen this default bring down a Ktor service handling 50K+ clients. The fix: right-sized HikariCP pools instrumented with Micrometer, PgBouncer as a connection multiplexer, and adaptive sizing driven by real metrics, not formulas.

The formula everyone trusts (and shouldn’t)

HikariCP’s wiki references the PostgreSQL formula: connections = (core_count * 2) + effective_spindle_count. For a typical 4-core cloud VM, that gives you 9-10 connections. On an 8-core instance, maybe 17.

For a CRUD API with predictable web traffic, this works. For a mobile backend? It’s a time bomb.

I’ve built production systems for mobile-first products, and the traffic shape is fundamentally different. A single push notification to 50K users generates a request spike that can 10x your baseline QPS in under 3 seconds. Every one of those requests needs a database connection, and your pool of 17 connections is now a bottleneck that cascades into HTTP 503s.

Connection pool tuning: HikariCP defaults kill mobile backends

Why mobile traffic breaks the model

Traffic characteristicWeb dashboardMobile backend
Request patternSteady, predictableBursty, correlated spikes
Retry behaviorUser refreshes pageExponential backoff (often misconfigured)
Spike multiplier2-3x baseline8-15x baseline
Spike durationMinutes3-30 seconds
Connection hold timeConsistentVariable (slow networks = longer txns)

The formula assumes CPU is the bottleneck. For mobile backends, the real bottleneck is connection wait time during bursts. Your CPUs sit at 30% while threads block waiting for a pool connection that never comes.

Instrument first, tune second

Before changing any pool size, you need visibility. HikariCP exposes metrics via Micrometer out of the box. Here’s the Ktor setup with ktor-server-metrics-micrometer:

val hikariConfig = HikariConfig().apply {
    maximumPoolSize = 20
    metricRegistry = prometheusMeterRegistry
    poolName = "mobile-api-pool"
}

Three metrics matter here:

  • hikaricp_connections_pending tracks threads waiting for a connection. If this exceeds 0 for more than 500ms, you’re in trouble.
  • hikaricp_connections_usage_seconds tracks how long connections are checked out. A rising p99 means slow queries or transaction leaks.
  • hikaricp_connections_timeout_total is the failure counter. Each increment is a request that got no connection within connectionTimeout.

Alert on pending connections, not pool utilization. A pool at 80% utilization is fine. A pool with 15 pending waiters for 2 seconds is about to cascade.

The right pool size: data over formulas

I ran load tests simulating push-notification bursts against a Ktor service on an 8-core instance backed by PostgreSQL 15.

Pool sizep50 latencyp99 latencyTimeout errors (per burst)CPU usage
17 (formula)12ms2,400ms3831%
3011ms480ms433%
5011ms85ms035%
10011ms82ms036%

Going from 17 to 50 connections eliminated timeouts and cut p99 by 96%. Going beyond 50 gave negligible improvement because PostgreSQL starts spending more time on connection context-switching. The sweet spot for this workload was 3x the formula, not 1x.

But you can’t just set maximumPoolSize = 50 on a PostgreSQL instance with a max_connections = 100 default and call it done. Two service replicas and you’ve exhausted your database.

PgBouncer: the second layer

This is where you need PgBouncer. It sits between your application pools and PostgreSQL, multiplexing many client connections onto fewer server connections.

[Mobile App] → [Ktor (HikariCP: 50 conns)] → [PgBouncer (pool_mode=transaction)] → [PostgreSQL (max 60 conns)]

With transaction mode, PgBouncer only holds a real PostgreSQL connection for the duration of a transaction. Between transactions, it returns the connection to its own pool. Your 50 HikariCP connections per instance don’t each need a dedicated PostgreSQL connection. They share a much smaller server-side pool.

[mobile_api]
pool_mode = transaction
default_pool_size = 30
reserve_pool_size = 10
reserve_pool_timeout = 3

The reserve_pool matters a lot for mobile backends. It gives you burst headroom: 10 extra connections that activate automatically when the main pool has been saturated for more than 3 seconds.

Adaptive sizing: let metrics drive configuration

Static pool sizes are a compromise. What works at 2 AM won’t work during morning commute. You can build an adaptive pool sizer using the metrics you’re already collecting:

@Scheduled(fixedRate = 30_000)
fun adjustPoolSize() {
    val pending = meterRegistry.get("hikaricp.connections.pending")
        .gauge().value()
    val currentMax = dataSource.hikariPoolMXBean.totalConnections

    val newSize = when {
        pending > 5 && currentMax < MAX_CEILING -> currentMax + 10
        pending == 0.0 && currentMax > MIN_FLOOR -> currentMax - 5
        else -> currentMax
    }
    dataSource.hikariConfigMXBean.maximumPoolSize = newSize
}

This isn’t theoretical. Running this on a production Ktor service reduced our average pool size by 40% during off-peak hours while maintaining zero timeouts during bursts.

What to actually do

Instrument before tuning. Add Micrometer metrics to HikariCP and alert on connections_pending > 0 sustained for 500ms. You can’t tune what you can’t measure, and the default formula was built for workloads that look nothing like mobile traffic.

Deploy PgBouncer in transaction mode. It decouples your application pool size from your PostgreSQL connection limit, which lets you size HikariCP for burst absorption (3-5x the formula) without exhausting database connections across replicas.

Build adaptive pool sizing driven by pending-connection metrics. Static configuration is always a compromise between peak and off-peak. A 30-second feedback loop that adjusts maximumPoolSize based on actual contention gives you burst capacity without wasting resources at baseline.

Most teams get this wrong because they treat the pool size formula as settled science. It’s a starting point for one type of workload. Mobile backends are a different animal. Measure your actual traffic shape and let the data tell you what the pool size should be.


Tags: kotlin, backend, architecture, mobile, api


Share: Twitter LinkedIn