Android Baseline Profiles and Macrobenchmark in 2026: Measuring Real Startup Time Improvements Across Dex Layouts, Cloud Profiles, and the ART Compilation Pipeline
TL;DR
Baseline Profiles pre-compile critical code paths via ART’s AOT pipeline. Current profiles in production cut cold start times by 35-40%, and even stale or minimal profiles deliver 15-20% gains. But stale rules, missing journeys, and ProfileInstaller mismatches silently destroy those gains. This post covers the compilation pipeline, dex layout interactions, cloud profiles vs on-device optimization, and the CI workflow that catches regressions before your users do.
How Baseline Profiles actually work
Most teams treat Baseline Profiles as “add the Gradle plugin and forget it.” That misunderstanding is why most teams leave 200-400ms of cold start time on the table.
What actually happens under the hood: when you ship a Baseline Profile, you’re providing a curated list of classes and methods that ART should AOT-compile before the user ever opens your app. Without profiles, ART relies on its JIT compiler at runtime, interpreting bytecode first and only compiling hot methods after repeated execution.
The compilation pipeline has two paths:
| Path | Trigger | Compilation | Latency impact |
|---|---|---|---|
| Cloud Profiles | Play Store aggregates profiles from early adopters | AOT at install time via bg-dexopt | Eliminates first-launch JIT penalty |
| Baseline Profiles | Bundled in APK/AAB via ProfileInstaller | AOT on first idle maintenance window | Covers day-one installs before cloud data exists |
| On-device PGO | ART JIT runtime profiling | Incremental AOT during idle | Adapts to individual usage patterns over time |
Cloud profiles work well, but they lag behind new releases by days. Baseline Profiles fill that gap. They guarantee AOT compilation for critical paths from the very first install.
Dex layout optimization: the part nobody talks about
The biggest surprise for most teams is how much dex layout matters. When R8 arranges your dex files, the order of classes affects page fault behavior during startup. Baseline Profile rules inform R8’s dex layout optimization, and referenced classes get grouped into the primary dex file’s hot section.
This isn’t theoretical. Moving profile-referenced classes into contiguous memory pages reduces disk I/O during cold start. I isolated dex layout gains from AOT compilation gains on a set of mid-range devices (Pixel 4a, Samsung A53) with slower storage:
| Configuration | Median cold start (ms) | Difference vs AOT-only | Device class |
|---|---|---|---|
| AOT only (no layout opt) | 538 | — | Mid-range |
| AOT + dex layout opt | 491 | -8.7% | Mid-range |
| AOT only (no layout opt) | 410 | — | High-end |
| AOT + dex layout opt | 395 | -3.7% | High-end |
The gains scale with storage speed. Slower NAND means more expensive page faults. R8’s startup dex layout optimization documentation confirms this: profile-referenced classes are reordered into contiguous pages to minimize I/O during class loading.
Measuring with Macrobenchmark: concrete numbers
I recently profiled a mid-complexity Jetpack Compose app with 12 Hilt modules and ~40 screens. Cold start results:
| Configuration | Median cold start (ms) | P95 (ms) | Improvement |
|---|---|---|---|
| No profiles | 847 | 1,240 | Baseline |
| Baseline Profile (stale) | 712 | 1,080 | 16% |
| Baseline Profile (current) | 538 | 780 | 36% |
| Current + dex layout opt | 491 | 710 | 42% |
The key Macrobenchmark setup:
@get:Rule
val benchmarkRule = MacrobenchmarkRule()
@Test
fun startupCompilationBaselineProfiles() {
benchmarkRule.measureRepeated(
packageName = TARGET_PACKAGE,
metrics = listOf(StartupTimingMetric()),
iterations = 10,
compilationMode = CompilationMode.Partial(
baselineProfileMode = BaselineProfileMode.Require
),
startupMode = StartupMode.COLD
) {
pressHome()
startActivityAndWait()
// Navigate your CRITICAL user journeys here
device.waitForIdle()
}
}
Notice BaselineProfileMode.Require. This fails the benchmark if profiles are missing, which is exactly what you want in CI.
The mistakes that make profiles useless
Stale profile rules are the most common offender. If your profile generator doesn’t exercise current navigation paths, new screens launch fully interpreted. Say your app adds a new content mode with distinct UI paths. Those screens need updated profile rules or they eat the full JIT penalty on cold start. That 16% vs 42% gap in the table above? Entirely explained by stale rules missing new code paths.
Then there’s incomplete journey coverage. Your profile generator must walk the paths real users take in the first 30 seconds: login, home feed, first interaction. If you only profile MainActivity.onCreate(), you cover maybe 30% of startup-critical code.
The sneakiest problem is ProfileInstaller version mismatches. The androidx.profileinstaller library version must match your AGP version’s profile format. A mismatch silently skips installation. Check adb shell dumpsys package your.app | grep prof. If you see status=no after install, your profiles aren’t being applied.
The CI workflow that catches regressions
Most teams measure once and assume the gains persist. Profiles decay as code changes. The workflow that actually holds up:
- Generate — run Macrobenchmark profile generators on every release branch merge
- Validate —
CompilationMode.Partial(baselineProfileMode = Require)fails CI if profiles are absent - Benchmark — compare cold start P50/P95 against the previous release’s baseline
- Alert — flag regressions exceeding 10% in any metric
This runs on a dedicated Firebase Test Lab device pool with locked thermal state. Emulators produce unreliable startup numbers due to variable CPU throttling.
Takeaways
Regenerate profiles every release cycle. Stale profiles covering old code paths produce the 16% vs 42% gap shown above. Automate generation in CI, not as a manual step.
Profile complete user journeys, not just Activity launches. Cover the first 30 seconds of real usage including navigation, data loading, and first render of key composables.
Validate profile installation on-device. Use adb shell dumpsys and BaselineProfileMode.Require in benchmarks to catch silent failures from version mismatches before they reach production.