Hosted by the dotnet team, this session features Bruno Capuano and Tal Wald as they demonstrate strategies for achieving fast, cost-effective AI inference using .NET, ONNX Runtime, and modern APIs.

.NET AI Community Standup: Blazing-Fast AI Inference on a Budget

Presented by: Bruno Capuano and Tal Wald
Hosted by: dotnet team

Overview

This session explores how developers can process over 20,000 sentences per second with minimal expenses by leveraging .NET for AI inference. The presenters demonstrate how AI workloads, typically developed in Python, can be accelerated and optimized using Microsoft’s .NET ecosystem.

Highlights include:

Leveraging Hugging Face models in .NET workflows
Utilizing ONNX Runtime for high-performance inference
Making use of the latest .NET 9 and 10 APIs
Designing flexible AI libraries to avoid tight coupling with specific engines or hardware

Key Topics Covered

Migrating from Python to .NET for AI

Advantages of migrating AI/ML workloads to .NET
Compatibility with popular Python-based workflows

Hugging Face Model Integration

Using pre-trained models from Hugging Face repositories
Adapting models for .NET-based inference

High-Performance Inference with ONNX Runtime

Introduction to ONNX and ONNX Runtime
Benchmarks: achieving over 20,000 sentences/sec
Strategies for cost-effective deployment

.NET 9 & 10 AI APIs

Overview of new APIs supporting AI workloads
Increased flexibility and abstraction for developers
Decoupling inference logic from underlying hardware

AI Library Architecture

Building libraries that interface with multiple inference backends
Best practices for extensibility and performance

Demos and Performance Comparisons

Real-world demonstrations of .NET-based AI inference
GPU versus CPU utilization metrics
Cost and performance trade-offs

From Research to Production

Tips for transitioning ML research prototypes to scalable, production-ready solutions
Managing operational costs

Useful Links

Conclusion

This standup provides actionable insights for .NET developers aiming to bring AI workloads to production efficiently. By combining ONNX Runtime, .NET’s new APIs, and proper architectural design, high performance and low cost are achievable without sacrificing flexibility.