Bobby Encoded
PostsAbout
PostsAbout

© 2026 Bobby Jose

← Back to Blog

How to Point Claude Code at a Local LLM with Ollama

May 20, 2026 · 8 min read

Claude Code, Ollama, LLM, Local AI, Developer Tools, AI, Productivity

I run Claude Code all day, but I kept wanting an offline, zero-cost option for the small stuff — quick edits, formatting, experiments on a plane. The good news: as of mid-2026 this got genuinely easy. Ollama now speaks Anthropic's Messages API natively, so you can point Claude Code straight at a local model with no translation proxy in between.

If you tried this a year ago, you probably wrestled with LiteLLM or a community shim to translate between Anthropic's /v1/messages format and Ollama's OpenAI-style API. You don't need any of that anymore.

Here's the full setup, and — just as important — an honest take on where local models hold up and where they don't.

How it works

Claude Code talks to whatever lives at ANTHROPIC_BASE_URL, expecting the Anthropic Messages API (/v1/messages). Recent Ollama builds implement exactly that endpoint. So the whole trick is:

  1. Run a model in Ollama (it serves on http://localhost:11434).
  2. Tell Claude Code to send its requests there instead of to api.anthropic.com.
  3. Pick the local model by name.

That's it. No proxy, no gateway.

Prerequisites

You'll need a reasonably recent Ollama (the Anthropic-compatible endpoint landed in the 0.14 series — anything newer is fine). Check yours:

ollama --version

Pull a coding-capable model. I'd start with Qwen2.5-Coder — the 14B if you have the RAM, the 7B if you don't:

ollama pull qwen2.5-coder:14b

The fast path

Newer Ollama ships a launcher that wires everything up for you:

ollama launch claude --model qwen2.5-coder:14b

This sets the environment and starts Claude Code against the local model in one command. If that works for you, you're done.

The manual setup (what's actually happening)

If you'd rather understand the moving parts — or the launcher isn't available — set the environment variables and run Claude Code yourself. The most robust form sets everything via env vars (including the model), so nothing depends on flag placement:

ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_MODEL=qwen2.5-coder:14b \
ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen2.5-coder:14b \
claude

What each one does:

  • ANTHROPIC_BASE_URL — routes Claude Code's requests to your local Ollama server instead of api.anthropic.com.
  • ANTHROPIC_AUTH_TOKEN — any non-empty value; it's sent as a Bearer token, which Ollama ignores. (Crucially, this is what beats a logged-in subscription — see the gotcha below.)
  • ANTHROPIC_MODEL — the main model. Set it here rather than via the --model flag (more on why in a second).
  • ANTHROPIC_DEFAULT_HAIKU_MODEL — the small "background" model Claude Code uses for housekeeping calls (session titles, summaries). If you don't point this at a local model too, those calls try to reach an Anthropic-only Haiku name and quietly fail. Set it to any model Ollama has.

Two traps will bite you if you skip the env-var form, and I hit both:

Trap 1: a Claude subscription hijacks the override

If you're signed into Claude Code with a Pro or Max subscription, there's a precedence trap waiting for you. Claude Code's auth order is, roughly: cloud provider creds → ANTHROPIC_AUTH_TOKEN → ANTHROPIC_API_KEY → subscription OAuth (last). Your bearer token should win — but here's the catch: setting ANTHROPIC_API_KEY="" (an empty string) is treated as "not set," and Claude Code quietly falls all the way through to your subscription. It then sends your request to Anthropic — which has never heard of qwen2.5-coder:14b — and you get:

There's an issue with the selected model (qwen2.5-coder:14b). It may not exist or you may not have access to it.

The tell is the session header: if it says "… · Claude Max", the override didn't take and you're still talking to Anthropic. The fix is simple — don't set ANTHROPIC_API_KEY at all, and let ANTHROPIC_AUTH_TOKEN do its job. When it's working, that same header flips to "… · API Usage Billing."

Confirm which endpoint you're actually on with /status inside the session: you want to see http://localhost:11434 and a custom bearer token, not "Claude Max." If it still shows the subscription, you've got ANTHROPIC_API_KEY exported somewhere in your shell — echo $ANTHROPIC_API_KEY, then unset it and relaunch.

Trap 2: the --model flag gets silently dropped

The natural way to pick the model is claude --model qwen2.5-coder:14b. The problem: that command is long, and when it wraps across two lines in your terminal, it's easy to hit Enter before the --model part — so claude launches with its default model (whatever Anthropic default, e.g. Opus) pointed at Ollama, and you get the "model may not exist" error all over again (this time complaining about claude-opus-4-8). The header gives it away: it shows the default model name, not your Qwen one.

This is exactly why I set ANTHROPIC_MODEL as an environment variable instead of using the flag — env vars are parsed as part of the single command regardless of how the line wraps, so the model selection can't get separated and lost.

Verify the endpoint before you trust it

Before wiring up the CLI, I like to confirm Ollama is actually answering in the Anthropic format. One curl does it:

curl -s http://localhost:11434/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: ollama" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "qwen2.5-coder:14b",
    "max_tokens": 64,
    "messages": [{"role": "user", "content": "Say: ollama-anthropic-ok"}]
  }'

If you get back JSON with "type":"message", a content array, and a usage block, the Anthropic-compatible endpoint is live and Claude Code will be able to talk to it.

A one-word switch (my favorite setup)

I don't want Ollama to be my default — I want my normal Claude subscription for real work and a quick way to drop into a local model when I'm offline or experimenting. An alias does exactly that. Add this to your ~/.zshrc (or ~/.bashrc):

alias claude-local='ANTHROPIC_BASE_URL=http://localhost:11434 ANTHROPIC_AUTH_TOKEN=ollama ANTHROPIC_MODEL=qwen2.5-coder:14b ANTHROPIC_DEFAULT_HAIKU_MODEL=qwen2.5-coder:14b claude'

Reload (source ~/.zshrc) and now:

  • claude-local → a session on your local Ollama model
  • claude → your normal Anthropic / subscription account, untouched

Because the alias bundles the whole env-var prefix into one token, you sidestep both traps above — nothing wraps, nothing gets dropped. To switch models later, just edit the two model names in the alias.

Making it the default (if you really want to)

If you'd rather have every claude invocation hit Ollama, drop the same variables into your Claude Code settings.json instead:

{
  "env": {
    "ANTHROPIC_BASE_URL": "http://localhost:11434",
    "ANTHROPIC_AUTH_TOKEN": "ollama",
    "ANTHROPIC_MODEL": "qwen2.5-coder:14b",
    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "qwen2.5-coder:14b"
  }
}

I'd avoid this unless local is genuinely your primary — the alias keeps the choice in your hands per session.

The honest part: where local models fall short

Here's where I have to be straight with you, because the setup being easy doesn't mean the experience matches Claude. Claude Code is an agentic tool — it leans hard on tool-use and function-calling to run shell commands, edit files, and chain multi-step work. That's exactly where small local models struggle:

  • Tool-call fidelity is the #1 problem. Smaller models (7B–14B) frequently botch the JSON for a tool call, drift from the tool schema, or fail to recover after an error mid-loop. When that happens, the agentic loop stalls or does the wrong thing.
  • Context budget. The agentic loop spends a lot of context on planning and tool results. Give the model at least 32K, ideally 64K+ tokens of context, or it'll lose the thread on anything non-trivial.
  • Reasoning depth. A local 8B–14B is fine for a rename, a formatting pass, or a single-file tweak. Multi-file refactors, gnarly debugging, and architecture decisions are where you'll feel the gap — and where hallucinated APIs start creeping in.

There's a nice parallel to the open-vs-closed model debate I wrote about after Build: models like Qwen and Llama are open-weight, so they can run locally at all — but the cost of that freedom is exactly this agentic-reliability gap.

When I actually use this

So is it worth it? Yes — for the right jobs:

  • ✅ Offline work — flights, spotty connections, air-gapped machines
  • ✅ Learning and experimentation — see how the agentic loop behaves without burning API budget
  • ✅ Cost-sensitive, low-stakes tasks — formatting, simple edits, boilerplate
  • ❌ Production coding workflows — complex refactors and debugging, where reliability matters, still belong on Claude Opus/Sonnet

My setup ends up being a hybrid: a local model for the cheap, offline inner loop, and Claude proper for the heavy lifting. The fact that it's now a three-variable change to flip between them is the real win.

Give it a try on your next offline afternoon — and keep your expectations calibrated to the model size.

← Previous

Microsoft Build 2026, Day 1: Seven New MAI Models and What They Mean for Developers

Next →

The Navigation Crash That Taught Me About Compose Lifecycle

STAY UPDATED

Get new posts on software engineering and AI in your inbox. No spam, unsubscribe anytime.