Android Baseline Profiles: the CI pipeline that cut cold start by 35%

Meta description: Learn how custom Macrobenchmark journeys, Cloud Profile delivery, and a CI tracing pipeline reduced cold launch time by 35% across device tiers using Baseline Profiles.

TL;DR: Default Baseline Profiles barely scratch the surface. By writing custom Macrobenchmark startup journeys, integrating Cloud Profile delivery via Google Play, and building a CI pipeline that validates profiles across device tiers, we reduced cold-start time by 35%. This post covers AOT compilation internals, DEX layout optimization, R8 gotchas that silently invalidate profiles, and the exact pipeline setup.

Why default Baseline Profiles underperform

Most teams generate a Baseline Profile by running the default BaselineProfileGenerator test, ship it, and move on. The problem: that auto-generated profile only covers the trivial happy path, Activity.onCreate() through first frame rendered. It misses the Dagger/Hilt injection graph, the initial network prefetch, and every lazy-initialized singleton your app touches in the first 2 seconds.

In my experience, the default profile typically covers 40-60% of the methods executed during a real cold start. The remaining methods get interpreted or JIT-compiled at runtime, which is exactly the penalty Baseline Profiles exist to eliminate.

Custom Macrobenchmark startup journeys

The Macrobenchmark library lets you define MacrobenchmarkRule-based tests that simulate realistic startup. The trick is modeling what your actual users do in the first 5 seconds:

@get:Rule
val benchmarkRule = MacrobenchmarkRule()

@Test
fun startupWithAuthAndFeed() {
    benchmarkRule.measureRepeated(
        packageName = TARGET_PACKAGE,
        metrics = listOf(StartupTimingMetric()),
        iterations = 10,
        startupMode = StartupMode.COLD,
    ) {
        pressHome()
        startActivityAndWait()
        // Wait for Dagger graph + initial API response
        device.wait(Until.hasObject(By.res("feed_list")), 5_000)
        // Scroll to trigger RecyclerView prefetch
        device.findObject(By.res("feed_list")).scroll(Direction.DOWN, 2f)
    }
}

This forces the profiler to record methods across dependency injection, network deserialization, and RecyclerView layout, all hot paths the default generator misses entirely.

Profile-guided AOT compilation internals

When ART installs a Baseline Profile, it performs profile-guided AOT compilation during bg-dexopt. The profile tells the compiler which methods and classes to pre-compile, and which classes to place together in the DEX layout for better page locality.

Compilation Mode	Methods Covered	Cold Start Impact
No profile (interpret + JIT)	0% pre-compiled	Baseline
Default Baseline Profile	~50% of startup methods	15-20% improvement
Custom journey profile	~85% of startup methods	30-40% improvement
Cloud Profile (aggregated)	~75% across user segments	25-35% improvement

Look at the gap between default and custom profiles. It’s not just about method count. DEX layout optimization depends on class loading order. When the profiler sees your full initialization graph, ART can colocate hot classes within the same memory pages, which means fewer page faults at startup. That’s where the real win hides.

Cloud Profile delivery via Google Play

Google Play aggregates anonymized runtime profiles from users and delivers them as Cloud Profiles to new installs. Useful, but with constraints: profiles take 1-2 weeks to propagate after a release, and they reflect the average user journey, not your optimized one.

The strategy I’d recommend is layering. Ship a custom Baseline Profile in your APK/AAB for immediate benefit, and let Cloud Profiles fill in coverage gaps over time. In build.gradle.kts:

baselineProfile {
    automaticGenerationDuringBuild = true
    saveInSrc = true
    mergeIntoMain = true
}

The R8 gotcha that silently breaks profiles

This one cost us weeks. R8 optimization can rename, inline, or remove methods that your Baseline Profile references. When that happens, the profile entries go stale. ART silently ignores them, you get zero benefit, and nothing in your build output tells you anything went wrong.

The fix: generate profiles after R8 processing, against the optimized APK. In your CI pipeline, the order must be:

Build release APK (R8 runs)
Install optimized APK on test device/emulator
Run Macrobenchmark profile generator against installed APK
Extract and embed the resulting profile

Reversing steps 1 and 3 is the single most common mistake I see. It produces profiles that look valid but match nothing at runtime. Maddening to debug.

CI-integrated tracing pipeline

We run this pipeline on every release branch across three device tiers: low-end (2GB RAM), mid-range, and flagship.

Pipeline Stage	Tool	Output
Profile generation	Macrobenchmark + Gradle managed devices	`baseline-prof.txt`
Profile validation	`profman --dump`	Method coverage report
Startup measurement	Macrobenchmark `StartupTimingMetric`	P50/P90 cold start (ms)
Regression gate	Custom Gradle task	Fail build if P50 regresses >5%

The validation step matters more than people think. Running profman --dump-classes-and-methods against your compiled profile lets you verify that method references actually resolve in the current DEX files. If coverage drops below your threshold, the pipeline catches it before release. Without this, you’re flying blind.

What to do with all this

Write custom Macrobenchmark startup journeys that cover your real initialization graph: DI, network, first meaningful content. Default generators leave 40%+ of hot methods uncompiled.

Always generate profiles after R8 processing. Profile-first pipelines produce silently broken profiles. Validate with profman --dump in CI to catch stale method references. I cannot overstate how quiet this failure mode is.

Measure across device tiers and set regression gates. A profile that shaves 200ms on a Pixel might do almost nothing on a low-RAM device where memory pressure dominates. Enforce P50/P90 thresholds in your CI pipeline so regressions don’t slip through unnoticed.

TAGS: android, kotlin, mobile, cicd, architecture

Android Baseline Profiles: the CI pipeline that cut cold start by 35%

Why default Baseline Profiles underperform

Custom Macrobenchmark startup journeys

Profile-guided AOT compilation internals

Cloud Profile delivery via Google Play

The R8 gotcha that silently breaks profiles

CI-integrated tracing pipeline

What to do with all this

Related Posts

PgBouncer transaction mode for 50k mobile users

Android LLM speed: KV cache persistence cuts latency 60%

gRPC-Web on mobile without a proxy: Connect Protocol