Related Posts
ai startup development
KV cache quantization: Llama 3.2 3B in 2 GB on Android
Deep dive into KV cache memory management for on-device LLM inference on Android — covering per-layer INT4/INT8 mixed quantization of key-value caches, grouped-
· 6 min read
ai startup development
Profile-Guided Optimization for Android App Startup: Baseline Profiles, Cloud Profiles, and the Dex Layout Pipeline That Cut Our Cold Start From 1.2s to 380ms
Deep dive into how ART's ahead-of-time compilation interacts with Baseline Profiles and cloud-aggregated profiles, covering the DEX layout reordering pipeline,
· 1 min read
ai startup development
Apple Foundation Models SDK with Claude Code: Building Hybrid On-Device/Cloud AI Pipelines for iOS Apps in Swift
Deep dive into Apple's just-announced Foundation Models framework (from WWDC/the new SDK docs trending on HN today), showing how to architect a tiered inference
· 5 min read