Bruno Capuano demonstrates how developers can run GPT-OSS locally using C#, Ollama, and Microsoft.Extensions.AI libraries to create fast, private, offline-capable AI features.

Running GPT-OSS Locally in C# Using Ollama and Microsoft.Extensions.AI

Bruno Capuano presents a developer-focused guide for setting up and running OpenAI’s GPT-OSS model locally with C#, utilizing the Microsoft.Extensions.AI abstraction layer and Ollama for local inference. This empowers developers to build private, offline-capable AI applications without cloud dependencies.

Why GPT-OSS Matters

Open-weight model: Powerful, open-source LLM options directly available for developers
Local execution: Models like gpt-oss-20b run comfortably on systems with 16GB RAM—no cloud required
Versatility: Supports coding, math, and tool use scenarios
Privacy & cost: Keeps all data on your machine, reducing privacy risks and cloud costs

Prerequisites

PC or Mac with at least 16GB RAM and suitable GPU (Apple Silicon supported)
.NET 8 SDK (or higher)
Ollama installed and running
GPT-OSS:20b model pulled via ollama pull gpt-oss:20b

Unified AI with Microsoft.Extensions.AI

Microsoft.Extensions.AI libraries unify AI access for .NET—whether using Ollama, Azure AI, or OpenAI—so you can switch providers without rewriting your core logic. You’ll use these abstractions with OllamaSharp for local GPT-OSS inference.

Step-by-Step: Build a Local C# Chatbot with GPT-OSS

1. Create a New Console App

 dotnet new console -n OllamaGPTOSS
 cd OllamaGPTOSS

2. Add NuGet Packages

 dotnet add package Microsoft.Extensions.AI
 dotnet add package OllamaSharp

Note: Microsoft.Extensions.AI.Ollama is deprecated; use OllamaSharp instead.

3. Implement Rolling Chat in C#

Replace Program.cs with the following (simplified for clarity):

using Microsoft.Extensions.AI;
using OllamaSharp;

IChatClient chatClient = new OllamaApiClient(new Uri("http://localhost:11434/"), "gpt-oss:20b");
List<ChatMessage> chatHistory = new();

Console.WriteLine("GPT-OSS Chat - Type 'exit' to quit");
while (true) {
    Console.Write("You: ");
    var userInput = Console.ReadLine();
    if (userInput?.ToLower() == "exit") break;
    if (string.IsNullOrWhiteSpace(userInput)) continue;
    chatHistory.Add(new ChatMessage(ChatRole.User, userInput));
    Console.Write("Assistant: ");
    var assistantResponse = "";
    await foreach (var update in chatClient.GetStreamingResponseAsync(chatHistory)) {
        Console.Write(update.Text);
        assistantResponse += update.Text;
    }
    chatHistory.Add(new ChatMessage(ChatRole.Assistant, assistantResponse));
    Console.WriteLine();
}

4. Run Your Application

Ensure Ollama is running:

 dotnet run

Your console app will stream responses from GPT-OSS locally.

Watch a demo here.

Beyond Chat: Build Agentic Apps

Use AIFunction calling—to let your GPT-OSS model call C# methods and APIs for tool use
Next steps: build document summarizers, code generation assistants, or enrich local RAG patterns
All data and processing remains private and offline

Upcoming: Foundry Local Integration

A follow-up article will guide developers through setting up GPT-OSS with Foundry Local for Windows-native GPU acceleration. This will include specific configuration tips and example code mirroring the stream/chat pattern shown here. See Foundry Local details in the Windows Developer Blog.

Summary and Recommendations

Set up a local LLM using .NET and Ollama with GPT-OSS-20b
Utilize Microsoft.Extensions.AI for portable, provider-agnostic AI architecture
Leverage function calling and agentic patterns—all running privately
Extend this base to build richer, offline, intelligent .NET applications

Bruno Capuano encourages you to get started, explore advanced features, and contribute to the growing ecosystem of decentralized AI tools for .NET.

This post appeared first on “Microsoft .NET Blog”. Read the entire article here