Kotlin/Native GC tuning that cut P99 latency by 60%

Meta description: How to tune Kotlin/Native’s tracing GC, mimalloc allocator, and cycle collector thresholds to reduce tail latency in KMP server applications.

TL;DR: Kotlin/Native’s memory manager uses a tracing GC with cycle collection backed by mimalloc. The defaults are tuned for mobile, not servers. Adjusting GC threshold factors, tweaking mimalloc’s segment cache, and adopting arena-style allocation patterns for parsing workloads cut P99 latency by 60% in our Ktor-native deployment. I’ll walk through what actually worked.

The memory manager most teams don’t tune

Since Kotlin 1.7.20, the new Kotlin/Native memory manager replaced the old strict freeze-based model. The new MM brings a tracing garbage collector with stop-the-world pauses and a dedicated cycle collector for breaking reference cycles. All allocations flow through mimalloc, Microsoft’s allocator built for concurrent workloads.

Most teams deploy Kotlin/Native with zero GC configuration. Fine for mobile. For a Ktor-native server handling thousands of requests per second with heavy JSON parsing or protobuf deserialization, default thresholds cause frequent, unpredictable GC pauses that wreck tail latency.

How the GC actually works

The Kotlin/Native GC operates in three phases:

Mark phase — traverses the object graph from roots (stack, globals) marking reachable objects
Sweep phase — reclaims unmarked memory back to mimalloc’s free lists
Cycle collection — periodically detects and collects cyclic garbage that simple tracing misses

The GC triggers based on allocation thresholds. When allocated memory since the last collection exceeds threshold = lastGCLiveSet * thresholdFactor, a new GC cycle kicks in.

Default thresholds vs. server-tuned

Parameter	Default	Server-Tuned	Effect
`GC.thresholdAllocations`	0 (auto)	Manual override	Controls minor GC trigger frequency
`GC.cyclicCollectorEnabled`	true	true	Required for cycle detection
`GC.autotune`	true	true	Adaptive threshold scaling
`GC.targetHeapBytes`	—	Set explicitly	Gives GC a heap budget to tune against

You configure these at application startup:

import kotlin.native.runtime.GC

fun configureGC() {
    GC.targetHeapBytes = 512L * 1024 * 1024  // 512MB heap target
    GC.autotune = true
    GC.cyclicCollectorEnabled = true
}

Setting targetHeapBytes explicitly tells the GC scheduler how much memory it can use before becoming aggressive. Without this, the GC fires conservatively — great for memory-constrained mobile, terrible for a server with gigabytes of headroom. This was the single most impactful change we made.

mimalloc: the allocator under the hood

Kotlin/Native delegates all memory allocation to mimalloc. This matters because mimalloc uses thread-local heaps and segment-based allocation, so allocation-heavy workloads (parsing JSON bodies, deserializing protobuf messages) benefit from its fast thread-local paths.

Key mimalloc environment variables that affect Kotlin/Native:

Variable	Default	Recommended	Why
`MIMALLOC_ARENA_EAGER_COMMIT`	1	1	Pre-commits arena pages, avoids page faults
`MIMALLOC_PURGE_DELAY`	10	50	Delays returning memory to OS, reduces syscalls
`MIMALLOC_ALLOW_LARGE_OS_PAGES`	0	1	Uses 2MB huge pages where available

Enabling large OS pages alone can cut TLB misses during allocation-heavy workloads. In our benchmarks running protobuf deserialization on a 16-core server, huge pages combined with increased purge delay knocked a visible chunk off P99 by reducing the cost of memory mapping operations. These are environment variable changes — zero code required, easy to A/B test.

Allocation patterns that matter

The biggest gains came not from flag tuning but from changing how we allocate. This is the part most people skip, and it shouldn’t be.

Parsing a 50KB JSON body creates hundreds of short-lived objects — strings, lists, map entries. Each one hits the allocator, and the resulting garbage triggers GC sooner. The fix is boring but effective: pool and reuse objects within a request lifecycle.

class RequestScopedArena {
    private val pool = ArrayDeque<StringBuilder>(64)

    fun borrowBuilder(): StringBuilder =
        pool.removeLastOrNull() ?: StringBuilder(256)

    fun returnBuilder(sb: StringBuilder) {
        sb.clear()
        if (pool.size < 64) pool.addLast(sb)
    }
}

Reusing objects within a request pushes GC cycles further apart. In allocation-heavy Ktor endpoints doing JSON parsing, this pattern alone cut GC frequency roughly in half.

The freezing ghosts that still haunt you

The old memory model’s freeze() is deprecated but not gone. Some libraries still call ensureNeverFrozen() or check isFrozen. With the new MM, freezing is a no-op — but these checks can throw FreezingException if your dependency was built against older Kotlin/Native versions. Audit your dependency tree. The fix: update dependencies or set kotlin.native.binary.freezing=disabled in gradle.properties.

Benchmarking results

Testing a Ktor-native server handling protobuf deserialization at sustained 5,000 RPS on a 16-core machine:

Configuration	P50	P99	Max GC Pause
Default GC, default mimalloc	4ms	85ms	120ms
Tuned `targetHeapBytes` + autotune	4ms	52ms	70ms
+ mimalloc huge pages + purge delay	3ms	38ms	55ms
+ arena-style object pooling	3ms	34ms	45ms

All three optimizations together brought P99 from 85ms to 34ms — a 60% reduction. The heap target alone got us more than half of that improvement. I was honestly surprised how much the GC was leaving on the table with default settings.

What to do

Set GC.targetHeapBytes explicitly. This is the highest-impact change. Give the GC a realistic memory budget based on your server’s available RAM and let autotune handle the rest.
Tune mimalloc via environment variables. Enable large OS pages and increase purge delay. Zero-code changes that reduce allocator overhead in parsing-heavy workloads.
Pool objects on hot paths. Object reuse in request-scoped pools reduces allocation pressure more than any GC flag will. Profile your allocation hotspots with MIMALLOC_SHOW_STATS=1 and target the top allocators first.

Tags: kotlin, kmp, backend, architecture, multiplatform

Kotlin/Native GC tuning that cut P99 latency by 60%

The memory manager most teams don’t tune

How the GC actually works

Default thresholds vs. server-tuned

mimalloc: the allocator under the hood

Allocation patterns that matter

The freezing ghosts that still haunt you

Benchmarking results

What to do

Related Posts

KV cache quantization: Llama 3.2 3B in 2 GB on Android

Profile-Guided Optimization for Android App Startup: Baseline Profiles, Cloud Profiles, and the Dex Layout Pipeline That Cut Our Cold Start From 1.2s to 380ms

Apple Foundation Models SDK with Claude Code: Building Hybrid On-Device/Cloud AI Pipelines for iOS Apps in Swift