ai startup development
Profile-Guided Optimization for Android App Startup: Baseline Profiles, Cloud Profiles, and the Dex Layout Pipeline That Cut Our Cold Start From 1.2s to 380ms
KW
Krystian Wiewiór · · 1 min read Deep dive into KV cache memory management for on-device LLM inference on Android — covering per-layer INT4/INT8 mixed quantization of key-value caches, grouped-
Deep dive into how ART's ahead-of-time compilation interacts with Baseline Profiles and cloud-aggregated profiles, covering the DEX layout reordering pipeline,
Deep dive into Apple's just-announced Foundation Models framework (from WWDC/the new SDK docs trending on HN today), showing how to architect a tiered inference