MVP Factory
ai startup development

KV Cache Quantization for On-Device LLM Inference on Android: INT4 Attention States, Sliding Window Eviction, and the Memory Architecture That Fits a 7B Model in 4GB RAM

KW
Krystian Wiewiór · · 1 min read

TAGS: android, kotlin, mobile, architecture


Share: Twitter LinkedIn