MVP Factory
ai startup development

Eliminating Android ANRs in Production: Strict Mode Traps, Binder Transaction Limits, and the Background Thread Architecture That Dropped Our ANR Rate From 2.1% to 0.08%

KW
Krystian Wiewiór · · 5 min read

TL;DR

Most production ANRs come from three sources: SharedPreferences.apply() blocking during onPause(), oversized Intent extras exhausting the binder transaction buffer, and synchronous work inside BroadcastReceivers. By instrumenting a custom ANR watchdog, migrating to DataStore, chunking IPC payloads through ContentProviders, and restructuring receivers with goAsync() plus coroutines, we dropped our ANR rate from 2.1% to 0.08%.


The problem most teams miss

What most teams get wrong about ANRs: they treat them as random flukes. They’re not. ANRs are deterministic. The main thread is blocked for 5+ seconds, and every occurrence has a traceable root cause. Google’s Play Console flags apps with ANR rates above 0.47%, and rates above that threshold directly hurt your store ranking and visibility.

The three culprits below account for roughly 80% of all ANR occurrences in mature Android codebases.


Root cause #1: SharedPreferences.apply() during onPause()

SharedPreferences.apply() is marketed as asynchronous. It is, until Activity.onPause() fires. The ActivityThread runs QueuedWork.waitToFinish() during lifecycle transitions, blocking the main thread until every pending apply() finishes its disk write. This one bit us harder than anything else.

OperationMain thread impactRisk during onPause()
SP.commit()Blocks immediatelyHigh (obvious)
SP.apply()Deferred write, but blocks at lifecycleHigh (hidden)
DataStore.edit{}Fully async via coroutinesNone

Migrating to DataStore with a wrapper

Swapping directly to DataStore across a large codebase is risky. We used a wrapper interface that let us migrate file-by-file without changing call sites:

interface KVStore {
    suspend fun getString(key: String, default: String = ""): String
    suspend fun putString(key: String, value: String)
}

class DataStoreKVStore(
    private val dataStore: DataStore<Preferences>
) : KVStore {
    override suspend fun getString(key: String, default: String): String =
        dataStore.data.map { it[stringPreferencesKey(key)] ?: default }.first()

    override suspend fun putString(key: String, value: String) {
        dataStore.edit { it[stringPreferencesKey(key)] = value }
    }
}

This interface let us swap implementations behind a feature flag. We migrated 34 SharedPreferences files over three sprints with no regressions.


Root cause #2: Binder transaction buffer exhaustion

The binder transaction buffer is capped at 1MB per process, shared across all concurrent IPC calls. Passing large bitmaps, serialized lists, or logging payloads through Intent extras silently eats into this limit. When it overflows, you get a TransactionTooLargeException or, worse, a silent ANR.

Route large payloads through a ContentProvider

For payloads exceeding 100KB, use a ContentProvider with a ParcelFileDescriptor pipe:

fun writePayloadToProvider(context: Context, data: ByteArray): Uri {
    val uri = PayloadContentProvider.createUri(UUID.randomUUID().toString())
    context.contentResolver.openOutputStream(uri)?.use { stream ->
        data.inputStream().copyTo(stream, bufferSize = 8192)
    }
    return uri // Pass this URI in the Intent instead
}
Payload strategyMax safe sizeANR risk
Intent extras (Bundle)~500KB practicalHigh above 200KB
FileProvider URIDisk-limitedLow
ContentProvider pipeMemory-limitedVery low
Shared ViewModel (same process)Heap-limitedNone

Root cause #3: BroadcastReceiver timeouts

BroadcastReceivers run onReceive() on the main thread with a strict 10-second timeout for foreground broadcasts (60 seconds for background). Any synchronous database query, network check, or heavy computation will trigger an ANR.

Use goAsync() with coroutines

class SyncReceiver : BroadcastReceiver() {
    override fun onReceive(context: Context, intent: Intent) {
        val pending = goAsync()
        CoroutineScope(Dispatchers.IO).launch {
            try {
                repository.performSync(intent.action)
            } finally {
                pending.finish()
            }
        }
    }
}

goAsync() returns a PendingResult that extends the window to 30 seconds and releases the main thread immediately. Pairing it with Dispatchers.IO keeps the work off the UI thread entirely.


Instrumenting ANR detection: the watchdog approach

Don’t wait for Play Console to tell you about ANRs. Catch them yourself with a main-thread watchdog:

class ANRWatchdog(private val timeoutMs: Long = 5000L) : Thread("ANR-Watchdog") {
    private val ticker = AtomicLong(0)

    override fun run() {
        while (!isInterrupted) {
            val start = ticker.get()
            Handler(Looper.getMainLooper()).post { ticker.incrementAndGet() }
            sleep(timeoutMs)
            if (ticker.get() == start) {
                reportANR(Looper.getMainLooper().thread.stackTrace)
            }
        }
    }
}

This watchdog posts to the main looper and checks whether the message was processed within the timeout. If not, it captures the main thread’s stack trace, giving you the same data that production ANR reports provide but in your debug and staging environments. I wish we’d added this six months earlier.


The audit process

  1. Enable StrictMode in debug builds to flag disk reads/writes and network calls on the main thread
  2. Deploy the ANR watchdog to internal builds with stack trace reporting
  3. Audit all SharedPreferences usage. Grep for .apply() and .commit() calls
  4. Profile Intent extras size by logging Bundle byte size at every startActivity and sendBroadcast call
  5. Review all BroadcastReceiver subclasses. If onReceive() does more than dispatch work, it’s a risk

What to do right now

Replace SharedPreferences with DataStore. Use a wrapper interface to migrate incrementally. Every apply() call is a latent ANR during lifecycle transitions, and it will bite you eventually.

Enforce a 100KB ceiling on Intent extras. Route anything larger through a ContentProvider or shared ViewModel. A debug-build lint check that logs Bundle sizes above the threshold takes an hour to write and saves weeks of debugging.

Instrument before you ship. Deploy a main-thread watchdog and StrictMode in every pre-production build. Catching ANRs in staging is dramatically cheaper than diagnosing them from Play Vitals.

ANRs aren’t mysterious. They’re engineering failures with engineering solutions. Audit systematically, instrument early, and keep the main thread clear.


Share: Twitter LinkedIn