Android Baseline Profiles and Macrobenchmark in 2026: Measuring Real Startup Time Improvements Across Dex Layouts, Cloud Profiles, and the ART Compilation Pipeline

TL;DR

Baseline Profiles pre-compile critical code paths via ART’s AOT pipeline. Current profiles in production cut cold start times by 35-40%, and even stale or minimal profiles deliver 15-20% gains. But stale rules, missing journeys, and ProfileInstaller mismatches silently destroy those gains. This post covers the compilation pipeline, dex layout interactions, cloud profiles vs on-device optimization, and the CI workflow that catches regressions before your users do.

How Baseline Profiles actually work

Most teams treat Baseline Profiles as “add the Gradle plugin and forget it.” That misunderstanding is why most teams leave 200-400ms of cold start time on the table.

What actually happens under the hood: when you ship a Baseline Profile, you’re providing a curated list of classes and methods that ART should AOT-compile before the user ever opens your app. Without profiles, ART relies on its JIT compiler at runtime, interpreting bytecode first and only compiling hot methods after repeated execution.

The compilation pipeline has two paths:

Path	Trigger	Compilation	Latency impact
Cloud Profiles	Play Store aggregates profiles from early adopters	AOT at install time via `bg-dexopt`	Eliminates first-launch JIT penalty
Baseline Profiles	Bundled in APK/AAB via ProfileInstaller	AOT on first idle maintenance window	Covers day-one installs before cloud data exists
On-device PGO	ART JIT runtime profiling	Incremental AOT during idle	Adapts to individual usage patterns over time

Cloud profiles work well, but they lag behind new releases by days. Baseline Profiles fill that gap. They guarantee AOT compilation for critical paths from the very first install.

Dex layout optimization: the part nobody talks about

The biggest surprise for most teams is how much dex layout matters. When R8 arranges your dex files, the order of classes affects page fault behavior during startup. Baseline Profile rules inform R8’s dex layout optimization, and referenced classes get grouped into the primary dex file’s hot section.

This isn’t theoretical. Moving profile-referenced classes into contiguous memory pages reduces disk I/O during cold start. I isolated dex layout gains from AOT compilation gains on a set of mid-range devices (Pixel 4a, Samsung A53) with slower storage:

Configuration	Median cold start (ms)	Difference vs AOT-only	Device class
AOT only (no layout opt)	538	—	Mid-range
AOT + dex layout opt	491	-8.7%	Mid-range
AOT only (no layout opt)	410	—	High-end
AOT + dex layout opt	395	-3.7%	High-end

The gains scale with storage speed. Slower NAND means more expensive page faults. R8’s startup dex layout optimization documentation confirms this: profile-referenced classes are reordered into contiguous pages to minimize I/O during class loading.

Measuring with Macrobenchmark: concrete numbers

I recently profiled a mid-complexity Jetpack Compose app with 12 Hilt modules and ~40 screens. Cold start results:

Configuration	Median cold start (ms)	P95 (ms)	Improvement
No profiles	847	1,240	Baseline
Baseline Profile (stale)	712	1,080	16%
Baseline Profile (current)	538	780	36%
Current + dex layout opt	491	710	42%

The key Macrobenchmark setup:

@get:Rule
val benchmarkRule = MacrobenchmarkRule()

@Test
fun startupCompilationBaselineProfiles() {
    benchmarkRule.measureRepeated(
        packageName = TARGET_PACKAGE,
        metrics = listOf(StartupTimingMetric()),
        iterations = 10,
        compilationMode = CompilationMode.Partial(
            baselineProfileMode = BaselineProfileMode.Require
        ),
        startupMode = StartupMode.COLD
    ) {
        pressHome()
        startActivityAndWait()
        // Navigate your CRITICAL user journeys here
        device.waitForIdle()
    }
}

Notice BaselineProfileMode.Require. This fails the benchmark if profiles are missing, which is exactly what you want in CI.

The mistakes that make profiles useless

Stale profile rules are the most common offender. If your profile generator doesn’t exercise current navigation paths, new screens launch fully interpreted. Say your app adds a new content mode with distinct UI paths. Those screens need updated profile rules or they eat the full JIT penalty on cold start. That 16% vs 42% gap in the table above? Entirely explained by stale rules missing new code paths.

Then there’s incomplete journey coverage. Your profile generator must walk the paths real users take in the first 30 seconds: login, home feed, first interaction. If you only profile MainActivity.onCreate(), you cover maybe 30% of startup-critical code.

The sneakiest problem is ProfileInstaller version mismatches. The androidx.profileinstaller library version must match your AGP version’s profile format. A mismatch silently skips installation. Check adb shell dumpsys package your.app | grep prof. If you see status=no after install, your profiles aren’t being applied.

The CI workflow that catches regressions

Most teams measure once and assume the gains persist. Profiles decay as code changes. The workflow that actually holds up:

Generate — run Macrobenchmark profile generators on every release branch merge
Validate — CompilationMode.Partial(baselineProfileMode = Require) fails CI if profiles are absent
Benchmark — compare cold start P50/P95 against the previous release’s baseline
Alert — flag regressions exceeding 10% in any metric

This runs on a dedicated Firebase Test Lab device pool with locked thermal state. Emulators produce unreliable startup numbers due to variable CPU throttling.

Takeaways

Regenerate profiles every release cycle. Stale profiles covering old code paths produce the 16% vs 42% gap shown above. Automate generation in CI, not as a manual step.

Profile complete user journeys, not just Activity launches. Cover the first 30 seconds of real usage including navigation, data loading, and first render of key composables.

Validate profile installation on-device. Use adb shell dumpsys and BaselineProfileMode.Require in benchmarks to catch silent failures from version mismatches before they reach production.

Android Baseline Profiles and Macrobenchmark in 2026: Measuring Real Startup Time Improvements Across Dex Layouts, Cloud Profiles, and the ART Compilation Pipeline

TL;DR

How Baseline Profiles actually work

Dex layout optimization: the part nobody talks about

Measuring with Macrobenchmark: concrete numbers

The mistakes that make profiles useless

The CI workflow that catches regressions

Takeaways

Related Posts

PgBouncer transaction mode for 50k mobile users

Android LLM speed: KV cache persistence cuts latency 60%

gRPC-Web on mobile without a proxy: Connect Protocol