Hosted by the dotnet team, this session features Bruno Capuano and Tal Wald as they demonstrate strategies for achieving fast, cost-effective AI inference using .NET, ONNX Runtime, and modern APIs.

.NET AI Community Standup: Blazing-Fast AI Inference on a Budget

Presented by: Bruno Capuano and Tal Wald
Hosted by: dotnet team

Overview

This session explores how developers can process over 20,000 sentences per second with minimal expenses by leveraging .NET for AI inference. The presenters demonstrate how AI workloads, typically developed in Python, can be accelerated and optimized using Microsoft’s .NET ecosystem.

Highlights include:

  • Leveraging Hugging Face models in .NET workflows
  • Utilizing ONNX Runtime for high-performance inference
  • Making use of the latest .NET 9 and 10 APIs
  • Designing flexible AI libraries to avoid tight coupling with specific engines or hardware

Key Topics Covered

Migrating from Python to .NET for AI

  • Advantages of migrating AI/ML workloads to .NET
  • Compatibility with popular Python-based workflows

Hugging Face Model Integration

  • Using pre-trained models from Hugging Face repositories
  • Adapting models for .NET-based inference

High-Performance Inference with ONNX Runtime

  • Introduction to ONNX and ONNX Runtime
  • Benchmarks: achieving over 20,000 sentences/sec
  • Strategies for cost-effective deployment

.NET 9 & 10 AI APIs

  • Overview of new APIs supporting AI workloads
  • Increased flexibility and abstraction for developers
  • Decoupling inference logic from underlying hardware

AI Library Architecture

  • Building libraries that interface with multiple inference backends
  • Best practices for extensibility and performance

Demos and Performance Comparisons

  • Real-world demonstrations of .NET-based AI inference
  • GPU versus CPU utilization metrics
  • Cost and performance trade-offs

From Research to Production

  • Tips for transitioning ML research prototypes to scalable, production-ready solutions
  • Managing operational costs

Conclusion

This standup provides actionable insights for .NET developers aiming to bring AI workloads to production efficiently. By combining ONNX Runtime, .NET’s new APIs, and proper architectural design, high performance and low cost are achievable without sacrificing flexibility.