Content by alberto martinez (1)

Stop routing docstrings to 70B models with on-device AI on Snapdragon | BRKSP90

Jun 4, 2026 by Alberto Martinez

Alberto Martinez explains how to reduce cost and latency in AI coding assistants by routing simple tasks (like docstrings) to smaller on-device models on Snapdragon NPUs, while reserving larger cloud models for complex work. The session outlines a three-tier routing architecture, quantization trade-offs, and a deployable classifier approach.

Videos

End of content