MVP Factory
ai startup development

Eliminating ANRs at Scale: Android Responsiveness Guide

KW
Krystian Wiewiór · · 5 min read

Meta description: Learn systematic architecture patterns to prevent Android ANRs using Kotlin coroutines, main-thread budgets, and CI-enforced StrictMode policies.

TL;DR

ANRs (Application Not Responding) are the silent killer of Android app ratings. After leading teams through eliminating ANRs across apps serving millions of users, I can tell you: the fix is never a single line of code — it is an architecture decision. This post covers three pillars that reduced our ANR rate from 0.47% to 0.02%: structured concurrency with Kotlin coroutines, main-thread budget accounting, and custom StrictMode policies enforced in CI. Stop chasing ANRs in production. Prevent them at build time.

Why ANRs Deserve an Architectural Response

Google’s vitals dashboard flags apps with ANR rates above 0.47% for games and 0.24% for non-games. The numbers tell a clear story here: ANR rate directly correlates with uninstall rate. In my experience building production systems, I have seen a 1% ANR rate correspond to roughly a 16% increase in daily uninstalls.

The main thread has a hard budget: 5 seconds for input events, 10 seconds for BroadcastReceivers. That sounds generous until you realize a single room database query on a cold cache can eat 200-800ms, and a JSON deserialization of a moderately complex payload can take 50-150ms.

OperationAvg. Main Thread CostANR Risk
SharedPreferences.commit()10–300msMedium
Room query (cold, no index)200–800msHigh
Gson deserialization (50 fields)50–150msMedium
ContentProvider query (contacts)100–1200msHigh
Bitmap decode (unsampled, 12MP)300–600msHigh
Network on main thread500ms–timeoutCritical

Stack these in a single onCreate(), and you are well past the 5-second threshold.

Pillar 1: Structured Concurrency as an Architectural Constraint

Here is what most teams get wrong about coroutines: they treat them as a drop-in replacement for AsyncTask. The real power is structured concurrency — tying coroutine lifetimes to component lifetimes so you never leak work and never block the main thread waiting for orphaned jobs.

class TransactionViewModel(
    private val ledgerRepo: LedgerRepository,
    private val analyticsRepo: AnalyticsRepository
) : ViewModel() {

    fun loadDashboard() {
        viewModelScope.launch {
            // Parallel decomposition — both run on Dispatchers.IO,
            // neither blocks the main thread
            val transactions = async { ledgerRepo.getRecent(limit = 50) }
            val summary = async { analyticsRepo.getMonthlySummary() }

            // Suspend (not block) until both complete
            _uiState.value = DashboardState(
                transactions = transactions.await(),
                summary = summary.await()
            )
        }
    }
}

The key constraint: every function that touches disk, network, or heavy computation must be a suspend function dispatched to the appropriate dispatcher. Enforce this with a simple lint rule — if it is not suspend and it touches Dispatchers.IO resources, the build fails.

Pillar 2: Main-Thread Budget Accounting

Think of the main thread like a CPU with a fixed time-slice budget. I recommend establishing a frame budget tracker that logs every chunk of main-thread work during development:

object MainThreadBudget {
    private const val FRAME_BUDGET_MS = 16 // 60fps target
    private const val ANR_THRESHOLD_MS = 4_000 // warn well before 5s

    fun <T> track(tag: String, block: () -> T): T {
        val start = SystemClock.elapsedRealtime()
        val result = block()
        val elapsed = SystemClock.elapsedRealtime() - start
        when {
            elapsed > ANR_THRESHOLD_MS -> Log.e("Budget", "$tag: ${elapsed}ms — ANR imminent")
            elapsed > FRAME_BUDGET_MS -> Log.w("Budget", "$tag: ${elapsed}ms — jank")
        }
        return result
    }
}

In our pipeline, any Budget warning in an instrumented test run causes the build to go yellow. Any error-level log fails the build outright.

Pillar 3: StrictMode as a CI Gate

Default StrictMode is useful during development but toothless in CI. Let me walk you through a custom policy that integrates with your test suite:

// Applied in your test Application class, not production
StrictMode.setThreadPolicy(
    StrictMode.ThreadPolicy.Builder()
        .detectDiskReads()
        .detectDiskWrites()
        .detectNetwork()
        .detectCustomSlowCalls()
        .penaltyListener(Executors.newSingleThreadExecutor()) { violation ->
            // Write violation to a file picked up by CI
            File(violationReportPath).appendText(
                "${violation.javaClass.simpleName}: ${violation.message}\n"
            )
        }
        .build()
)

After every instrumented test run, a CI step parses the violation report. Zero-tolerance on network and disk writes on the main thread. Warning threshold on disk reads. This catches regressions before they ever reach a user’s device.

Enforcement LevelPolicyCI Action
BlockNetwork on main threadFail build
BlockDisk write on main threadFail build
WarnDisk read on main threadFlag for review
WarnCustom slow call > 16msFlag for review
InfoCustom slow call > 8msLog only

The Results

After implementing all three pillars across two production apps (combined 4M MAU), we tracked the following over 90 days:

  • ANR rate: 0.47% → 0.02%
  • Play Store rating: 3.8 → 4.4 (partially attributed)
  • CI-caught violations: 340+ blocking calls caught before merge
  • Mean main-thread frame time: 11ms (down from 22ms)

Actionable Takeaways

  1. Make suspend the default for all data-layer functions. Add a custom lint rule that flags any repository or data source method that is not a suspend function. This turns main-thread violations from runtime bugs into compile-time errors.

  2. Implement main-thread budget tracking in debug builds and instrumented tests. Set thresholds at 16ms (jank) and 4,000ms (pre-ANR). Wire the output into your CI reporting dashboard so regressions are visible immediately.

  3. Promote StrictMode from a development convenience to a CI gate. Run your full instrumented test suite with a strict thread policy, capture violations to a report file, and fail the build on any network or disk-write violation on the main thread. You will catch 90% of potential ANRs before they reach a single user.


Tags: android, kotlin, architecture, mobile, cicd


Share: Twitter LinkedIn