Content by alberto martinez (1)

Alberto Martinez explains how to reduce cost and latency in AI coding assistants by routing simple tasks (like docstrings) to smaller on-device models on Snapdragon NPUs, while reserving larger cloud models for complex work. The session outlines a three-tier routing architecture, quantization trade-offs, and a deployable classifier approach.
Videos

End of content

Rejoining the server...

Rejoin failed... trying again in seconds.

Failed to rejoin.
Please retry or reload the page.

The session has been paused by the server.

Failed to resume the session.
Please reload the page.