Eliminating Android ANRs in Production: Strict Mode Traps, Binder Transaction Limits, and the Background Thread Architecture That Dropped Our ANR Rate From 2.1% to 0.08%
TL;DR
Most production ANRs come from three sources: SharedPreferences.apply() blocking during onPause(), oversized Intent extras exhausting the binder transaction buffer, and synchronous work inside BroadcastReceivers. By instrumenting a custom ANR watchdog, migrating to DataStore, chunking IPC payloads through ContentProviders, and restructuring receivers with goAsync() plus coroutines, we dropped our ANR rate from 2.1% to 0.08%.
The problem most teams miss
What most teams get wrong about ANRs: they treat them as random flukes. They’re not. ANRs are deterministic. The main thread is blocked for 5+ seconds, and every occurrence has a traceable root cause. Google’s Play Console flags apps with ANR rates above 0.47%, and rates above that threshold directly hurt your store ranking and visibility.
The three culprits below account for roughly 80% of all ANR occurrences in mature Android codebases.
Root cause #1: SharedPreferences.apply() during onPause()
SharedPreferences.apply() is marketed as asynchronous. It is, until Activity.onPause() fires. The ActivityThread runs QueuedWork.waitToFinish() during lifecycle transitions, blocking the main thread until every pending apply() finishes its disk write. This one bit us harder than anything else.
| Operation | Main thread impact | Risk during onPause() |
|---|---|---|
SP.commit() | Blocks immediately | High (obvious) |
SP.apply() | Deferred write, but blocks at lifecycle | High (hidden) |
DataStore.edit{} | Fully async via coroutines | None |
Migrating to DataStore with a wrapper
Swapping directly to DataStore across a large codebase is risky. We used a wrapper interface that let us migrate file-by-file without changing call sites:
interface KVStore {
suspend fun getString(key: String, default: String = ""): String
suspend fun putString(key: String, value: String)
}
class DataStoreKVStore(
private val dataStore: DataStore<Preferences>
) : KVStore {
override suspend fun getString(key: String, default: String): String =
dataStore.data.map { it[stringPreferencesKey(key)] ?: default }.first()
override suspend fun putString(key: String, value: String) {
dataStore.edit { it[stringPreferencesKey(key)] = value }
}
}
This interface let us swap implementations behind a feature flag. We migrated 34 SharedPreferences files over three sprints with no regressions.
Root cause #2: Binder transaction buffer exhaustion
The binder transaction buffer is capped at 1MB per process, shared across all concurrent IPC calls. Passing large bitmaps, serialized lists, or logging payloads through Intent extras silently eats into this limit. When it overflows, you get a TransactionTooLargeException or, worse, a silent ANR.
Route large payloads through a ContentProvider
For payloads exceeding 100KB, use a ContentProvider with a ParcelFileDescriptor pipe:
fun writePayloadToProvider(context: Context, data: ByteArray): Uri {
val uri = PayloadContentProvider.createUri(UUID.randomUUID().toString())
context.contentResolver.openOutputStream(uri)?.use { stream ->
data.inputStream().copyTo(stream, bufferSize = 8192)
}
return uri // Pass this URI in the Intent instead
}
| Payload strategy | Max safe size | ANR risk |
|---|---|---|
| Intent extras (Bundle) | ~500KB practical | High above 200KB |
| FileProvider URI | Disk-limited | Low |
| ContentProvider pipe | Memory-limited | Very low |
| Shared ViewModel (same process) | Heap-limited | None |
Root cause #3: BroadcastReceiver timeouts
BroadcastReceivers run onReceive() on the main thread with a strict 10-second timeout for foreground broadcasts (60 seconds for background). Any synchronous database query, network check, or heavy computation will trigger an ANR.
Use goAsync() with coroutines
class SyncReceiver : BroadcastReceiver() {
override fun onReceive(context: Context, intent: Intent) {
val pending = goAsync()
CoroutineScope(Dispatchers.IO).launch {
try {
repository.performSync(intent.action)
} finally {
pending.finish()
}
}
}
}
goAsync() returns a PendingResult that extends the window to 30 seconds and releases the main thread immediately. Pairing it with Dispatchers.IO keeps the work off the UI thread entirely.
Instrumenting ANR detection: the watchdog approach
Don’t wait for Play Console to tell you about ANRs. Catch them yourself with a main-thread watchdog:
class ANRWatchdog(private val timeoutMs: Long = 5000L) : Thread("ANR-Watchdog") {
private val ticker = AtomicLong(0)
override fun run() {
while (!isInterrupted) {
val start = ticker.get()
Handler(Looper.getMainLooper()).post { ticker.incrementAndGet() }
sleep(timeoutMs)
if (ticker.get() == start) {
reportANR(Looper.getMainLooper().thread.stackTrace)
}
}
}
}
This watchdog posts to the main looper and checks whether the message was processed within the timeout. If not, it captures the main thread’s stack trace, giving you the same data that production ANR reports provide but in your debug and staging environments. I wish we’d added this six months earlier.
The audit process
- Enable StrictMode in debug builds to flag disk reads/writes and network calls on the main thread
- Deploy the ANR watchdog to internal builds with stack trace reporting
- Audit all SharedPreferences usage. Grep for
.apply()and.commit()calls - Profile Intent extras size by logging
Bundlebyte size at everystartActivityandsendBroadcastcall - Review all BroadcastReceiver subclasses. If
onReceive()does more than dispatch work, it’s a risk
What to do right now
Replace SharedPreferences with DataStore. Use a wrapper interface to migrate incrementally. Every apply() call is a latent ANR during lifecycle transitions, and it will bite you eventually.
Enforce a 100KB ceiling on Intent extras. Route anything larger through a ContentProvider or shared ViewModel. A debug-build lint check that logs Bundle sizes above the threshold takes an hour to write and saves weeks of debugging.
Instrument before you ship. Deploy a main-thread watchdog and StrictMode in every pre-production build. Catching ANRs in staging is dramatically cheaper than diagnosing them from Play Vitals.
ANRs aren’t mysterious. They’re engineering failures with engineering solutions. Audit systematically, instrument early, and keep the main thread clear.