Microsoft Build 2026, Day 1: Seven New MAI Models and What They Mean for Developers
June 3, 2026 · 8 min read
Microsoft Build, AI, MAI, LLM, Ollama, GitHub Copilot, Azure AI Foundry, Developer Tools
I've been following Microsoft Build 2026 since the Day 1 keynote, and the theme was impossible to miss: Satya Nadella framed this as the "agent-first" era — AI that plans, executes, and manages multi-step work instead of waiting for a prompt. There was a lot on stage (the Microsoft Agent Framework 1.0 hitting GA, Microsoft IQ as a context layer across Copilot and Foundry, a native GitHub Copilot desktop app, Jensen Huang showing up to talk RTX Spark dev boxes, even a Majorana 2 quantum update).
But the thing that actually got me excited as a developer was quieter: Microsoft launched seven of its own in-house MAI models. Not OpenAI models served through Azure — Microsoft's own. That's a real strategic shift, and it changes the cost and lock-in math for those of us building on this stack.
Let me break down all seven from a developer's point of view, then tackle the question I immediately had: can I run any of these locally with Ollama?
The Seven MAI Models
Here's the full lineup announced on Day 1:
| Model | Architecture | Context | Best For |
|---|---|---|---|
| MAI-Thinking-1 | 35B active / ~1T total (sparse MoE) | 256K tokens | Complex multi-step reasoning, math, code architecture |
| MAI-Code-1-Flash | 5B active / 137B total (sparse MoE) | 256K tokens | Inline code gen, high-throughput coding workflows |
| MAI-Image-2.5 | — | — | Text-to-image + image-to-image editing |
| MAI-Image-2.5 Flash | — | — | Faster, cheaper image generation |
| MAI-Transcribe-1.5 | — | — | Speech-to-text across 43 languages |
| MAI-Voice-2 | — | — | Text-to-speech in 15+ languages |
| MAI-Voice-2 Flash | — | — | Low-latency voice synthesis |
(Specs are drawn from Microsoft's official model pages, the Microsoft Foundry blog, and the published model cards; some multimodal details Microsoft hasn't broken out in full.)
MAI-Thinking-1 — the flagship reasoner
This is the headline. A sparse Mixture-of-Experts model with ~35B active parameters out of roughly a trillion total, and a 256K-token context window. What stood out to me: Microsoft says it was trained without OpenAI data — this is genuinely their own reasoning model. It's tuned for long-context reasoning, multi-step instructions, and code generation, which makes it the natural backend for the agentic workflows the rest of the keynote was about.
MAI-Code-1-Flash — the one I'll use first
Don't let the "Flash" name fool you into thinking it's tiny. Per its model card it's a sparse Mixture-of-Experts with 137B total parameters but only ~5B active per token, plus a 256K context window — so you get MoE-scale quality at small-model speed. It's purpose-built for GitHub Copilot and VS Code and is already rolling out across the Free, Pro, Pro+, and Max plans (starting in VS Code, expanding over the coming weeks). GitHub's pricing page lists it at $0.75/M input tokens, $4.50/M output, and a dramatically cheaper $0.075/M for cached input. For a coding model, that's aggressive.
The multimodal four
- MAI-Image-2.5 — adds image-to-image editing on top of text-to-image, with a Flash variant for faster, cheaper generation. (Early press cited strong Arena-leaderboard placements, but I'd treat the exact rankings as unconfirmed until Microsoft publishes them.)
- MAI-Transcribe-1.5 — speech-to-text across 43 languages, now with content biasing and improved accuracy.
- MAI-Voice-2 (+ Flash) — multilingual TTS with voice cloning across 15+ languages.
The image, voice, and transcription models are generally available now in Microsoft Foundry; MAI-Thinking-1 is in private preview.
The "Flash" pattern repeating across image, voice, and code tells you Microsoft is thinking hard about the cost-per-call of agent systems that fire thousands of model calls in a loop.
Why this matters for developers
Three things jumped out:
- Less OpenAI dependency. CNBC framed the launch exactly this way — Microsoft reducing reliance on OpenAI while lowering costs for developers. If you're building on Azure, you now have a first-party option that Microsoft controls end to end.
- They're not Azure-exclusive. This surprised me. The MAI models are slated to be available through third-party inference providers — Fireworks AI, Baseten, and OpenRouter — not just Microsoft Foundry. That's a deliberately open distribution play.
- Purpose-built tiers. A dedicated reasoning model, a dedicated fast coder, and Flash variants everywhere. This is a lineup designed for agents, not chatbots.
How to actually call them
Day 1 access paths, depending on the model:
- MAI-Thinking-1 — private preview in Microsoft Foundry, with Chat Completions API support; rolling out to Fireworks AI / Baseten / OpenRouter.
- MAI-Code-1-Flash — live in GitHub Copilot (VS Code), expanding across Free/Pro/Max plans; also via the inference providers.
- MAI-Image / Voice / Transcribe — through Azure AI Foundry.
Because they expose a Chat Completions–compatible API, calling MAI-Thinking-1 through OpenRouter looks like every other OpenAI-style client:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="YOUR_OPENROUTER_KEY",
)
response = client.chat.completions.create(
model="microsoft/mai-thinking-1",
messages=[
{"role": "system", "content": "You are a senior backend engineer."},
{"role": "user", "content": "Design a retry strategy for a flaky payment API."},
],
)
print(response.choices[0].message.content)
The OpenAI-compatible surface is the real story here — you don't have to learn a new SDK to try them.
The big question: can I run MAI models on Ollama?
This is where I got excited and then had to be honest with myself. I love running models locally with Ollama — no API key, no per-token bill, no data leaving my machine. So my first instinct was "great, when's the ollama pull mai-thinking-1?"
Short answer: you can't — at least not today.
Ollama runs open-weight models: the weights are published, you download them, and it handles quantization and GPU offload locally. Across every Day 1 source I read, none of the seven MAI models were announced as open-weight or downloadable. They're distributed through cloud/API channels only — Microsoft Foundry, GitHub Copilot, and the commercial inference providers (Fireworks AI, Baseten, OpenRouter). There's no .gguf, no model card on a hub, nothing for Ollama to pull.
So if "use it with Ollama" means pull the weights and run MAI-Thinking-1 offline on my own GPU — that's a no for now.
But there are two realistic paths if local matters to you:
Option 1 — Run Microsoft's Phi models locally instead
If what you want is a Microsoft model running fully on your machine, that already exists. The Phi family is open-weight and in the Ollama library today:
# Microsoft's current flagship small model (~14B)
ollama pull phi4
# The 3.8B mini — punches well above its weight on modest hardware
ollama pull phi4-mini
ollama run phi4 "Refactor this function to use the retry pattern."
Phi isn't MAI, but it's Microsoft's open line, and for a lot of local coding and reasoning tasks phi4 is genuinely good. Microsoft even has a .NET + Ollama guide if you're on the C# stack like I often am.
Option 2 — Point Ollama-style tooling at MAI over the network
Ollama and most local-first tools speak the OpenAI-compatible API. So even though the weights are remote, you can keep your existing local-dev workflow and just swap the base_url to OpenRouter to hit microsoft/mai-thinking-1. You lose the "offline, free, private" part — but you keep the ergonomics. For a lot of agent prototyping, that's the pragmatic middle ground: Phi locally for the cheap/private inner loop, MAI over the API when you need the heavyweight reasoner.
My takeaway
Build 2026 Day 1 was an agent-first keynote, but the MAI model launch is the part with the longest tail for developers. Microsoft now has its own reasoning model, its own fast coder living in Copilot, and a full multimodal set — and they're shipping it beyond Azure through OpenRouter and friends.
The one thing I'll keep watching: will any MAI model ever get open weights? Until it does, Ollama users are stuck on Phi for truly local Microsoft inference. Given how committed Microsoft is to Phi being open and how closed MAI looks right now, I'd bet these stay in two separate lanes — open Phi for the edge and your laptop, proprietary MAI for the cloud.
For now: I'm going to wire MAI-Code-1-Flash into my Copilot setup, run phi4 locally for offline work, and keep an OpenRouter key around for when I want to throw a hard problem at MAI-Thinking-1's 256K context.
Onto Day 2.
STAY UPDATED
Get new posts on software engineering and AI in your inbox. No spam, unsubscribe anytime.