ai startup development

Speculative Decoding on Android: Running Draft-and-Verify LLM Inference On-Device with Dual GGUF Models and the Token Acceptance Pipeline That Doubles Generation Speed

Krystian Wiewiór · Apr 24, 2026 · 1 min read

Tags: android, kotlin, mobile, architecture, kmp

ARM NEON SIMD Intrinsics for Mobile Text Embedding: Building a Sub-10ms Semantic Search Pipeline That Runs Entirely On-Device

Deep dive into using ARM NEON vectorized dot-product and quantized int8 matrix multiplication to accelerate small embedding models (like E5-small or GTE-tiny) o

Jun 19, 2026 · 5 min read

ai startup development

Speculative Decoding on Mobile GPUs: Running Draft-Verify LLM Pipelines on Android with Vulkan Compute and Dynamic Batch Scheduling

Implement speculative decoding — where a tiny draft model proposes tokens and a larger verify model accepts/rejects them in parallel — entirely on-device using

Jun 19, 2026 · 5 min read

ai startup development

CRDTs for Offline-First Mobile Sync: Automerge in Kotlin Multiplatform, Vector Clocks on Constrained Devices, and the Conflict-Free Data Layer That Eliminates Your Backend Sync Service

Practical implementation of CRDT primitives (LWW-Register, G-Counter, RGA) in KMP shared code with actual Automerge-kt integration, comparing sync strategies (s

Jun 18, 2026 · 5 min read

Speculative Decoding on Android: Running Draft-and-Verify LLM Inference On-Device with Dual GGUF Models and the Token Acceptance Pipeline That Doubles Generation Speed

Related Posts

ARM NEON SIMD Intrinsics for Mobile Text Embedding: Building a Sub-10ms Semantic Search Pipeline That Runs Entirely On-Device

Speculative Decoding on Mobile GPUs: Running Draft-Verify LLM Pipelines on Android with Vulkan Compute and Dynamic Batch Scheduling

CRDTs for Offline-First Mobile Sync: Automerge in Kotlin Multiplatform, Vector Clocks on Constrained Devices, and the Conflict-Free Data Layer That Eliminates Your Backend Sync Service