MVP Factory
ai startup development

Systematic ANR Diagnosis in Jetpack Compose Apps: StrictMode Gaps, Perfetto Trace Correlation, and the Lock Contention Patterns That Hide Behind Main-Safe Coroutines

KW
Krystian Wiewiór · · 5 min read

TL;DR

StrictMode does not catch lock contention on the main thread. In Jetpack Compose apps, Dispatchers.Main.immediate combined with synchronized Room DAO callbacks creates a pattern where the main thread blocks on a lock held by a background thread. Technically “main-safe” code that still triggers ANRs. Perfetto’s slice-track correlation is the only reliable way to identify the exact lock holder across threads. This post covers the diagnosis workflow and a CI gate that catches these call chains before they reach production.


The problem StrictMode cannot see

StrictMode catches disk reads, network calls, and untagged sockets on the main thread. But here is what most teams get wrong: it does not instrument lock acquisition. If your main thread calls a suspend function that internally acquires a synchronized lock held by a Room write transaction on Dispatchers.IO, StrictMode reports nothing. The main thread is technically not doing I/O. It is waiting on a monitor.

In production Compose-heavy apps that have already eliminated obvious StrictMode violations, this pattern accounts for roughly 15-30% of ANR clusters. I’ve seen it over and over.

The invisible call chain

Here is the typical sequence:

  1. A LaunchedEffect calls a repository method on Dispatchers.Main.immediate
  2. The repository calls a Room DAO method annotated with @Transaction
  3. Room’s generated code acquires a synchronized lock on the RoomDatabase instance
  4. A background Dispatchers.IO coroutine is already holding that lock (bulk insert, migration, or WAL checkpoint)
  5. The main thread blocks on monitor entry. Zero StrictMode output.
// Looks safe. It is not.
@Composable
fun DashboardScreen(viewModel: DashboardViewModel) {
    LaunchedEffect(Unit) {
        // Dispatchers.Main.immediate by default
        viewModel.refreshStats() // suspend, calls Room @Transaction
    }
}

The suspend keyword lulls teams into thinking this is non-blocking. But Room’s internal synchronized block does not suspend. It blocks the calling thread.

Perfetto slice-track correlation: the diagnosis workflow

Perfetto captures thread state transitions that Systrace and StrictMode cannot. Here is the step-by-step workflow:

StepToolWhat you find
1. Capture traceadb shell perfetto with sched + lock_contention data sourcesRaw thread scheduling data
2. Find ANR windowPerfetto UI, search for SIG_ANR or Input dispatching timed outExact timestamp of ANR trigger
3. Inspect main threadSlice track, look for monitor contention slicesLock address + blocked duration
4. Cross-reference holderFilter by lock address across all thread tracksBackground thread holding the lock
5. Read holder stackHolder thread’s slice track at same timestampExact call chain (e.g., Room beginTransaction)

The Perfetto query that matters

SELECT ts, dur, thread.name, args.display_value
FROM slice
JOIN thread_track ON slice.track_id = thread_track.id
JOIN thread USING (utid)
WHERE slice.name LIKE '%monitor contention%'
  AND thread.name = 'main'
  AND dur > 100000000  -- >100ms, ANR-risk threshold
ORDER BY dur DESC

This query surfaces every main-thread lock wait exceeding 100ms. In one production audit, I found 11 distinct lock-contention sites that had passed StrictMode checks for months. Eleven. All invisible to the existing tooling.

Building a CI gate for ANR-risk chains

Waiting for production ANRs is expensive and demoralizing. Here is a CI gate architecture that catches these patterns statically.

Static analysis with custom lint rules

// Custom Lint detector: flag @Transaction calls reachable from Main dispatcher
class MainThreadTransactionDetector : Detector(), SourceCodeScanner {
    override fun getApplicableMethodNames() = listOf("withTransaction")
    
    override fun visitMethodCall(context: JavaContext, node: UCallExpression, method: PsiMethod) {
        if (isReachableFromMainDispatcher(context, node)) {
            context.report(
                ANR_RISK_ISSUE, node, context.getLocation(node),
                "Room @Transaction reachable from Dispatchers.Main"
            )
        }
    }
}

The CI pipeline

StageCheckThreshold
LintCustom MainThreadTransactionDetector0 warnings
Instrumented testMacro-benchmark with Perfetto trace captureMain-thread lock wait < 50ms
Trace analysisAutomated Perfetto SQL query on CI traces0 slices > 100ms

The macro-benchmark stage matters most. Run realistic user flows (app cold start, navigation between Compose screens, data sync) while capturing Perfetto traces. Parse the traces with the SQL query above and fail the build if any main-thread lock contention exceeds your threshold.

I keep long coding sessions healthy with HealthyDesk for break reminders, because debugging ANR traces for hours without moving is its own kind of system failure.

The fix

Once you identify a lock-contention site, the fix is simple: never acquire Room’s database lock from a main-thread coroutine.

// Before: ANR risk
suspend fun refreshStats() {
    val stats = dao.getStatsInTransaction() // blocks main thread on lock
    _state.value = stats
}

// After: explicit dispatcher switch before lock acquisition
suspend fun refreshStats() {
    val stats = withContext(Dispatchers.IO) {
        dao.getStatsInTransaction() // lock acquired on IO thread
    }
    _state.value = stats
}

withContext(Dispatchers.IO) ensures the synchronized block executes on a thread that can safely block without causing ANRs.

What to do with all this

Stop trusting StrictMode alone for ANR prevention. It misses lock contention entirely. Add Perfetto trace analysis to your instrumented test suite and query for monitor contention slices on the main thread exceeding 50ms.

Wrap every Room @Transaction call in withContext(Dispatchers.IO). The suspend modifier on DAO methods does not prevent the underlying synchronized block from blocking the calling thread. Be explicit about which dispatcher acquires locks.

Build your CI gate in three layers: static lint rules to catch @Transaction calls reachable from Main dispatchers, macro-benchmark traces with automated Perfetto SQL analysis, and a zero-tolerance threshold for main-thread lock waits above 100ms. Catch the pattern before your users do.


Share: Twitter LinkedIn