I Built a 300-Line Coding Agent for Local Models
June 3, 2026 · 6 min read
AI, LLM, Ollama, Python, Agents, Local AI, Developer Tools
In my last couple of posts I wired Claude Code up to a local model through Ollama, then watched it take two minutes to answer on my M1 Max. The model wasn't the whole story — the harness around it was doing a lot of heavy lifting that a local model couldn't keep up with.
That left me with an itch: what is the harness, really? The thing that turns "a model that outputs text" into "an agent that reads my files, edits them, runs my tests, and fixes its own mistakes." Is that magic, or is it something I could build?
Turns out it's mostly not magic. I built a working one in about 300 lines of dependency-free Python. But the interesting part wasn't the code I planned — it was the two bugs that stood between "prints JSON at me" and "actually edits files." That's the part worth writing down.
The harness is just a loop
Strip away the UI and the polish, and an agentic coding tool is a single loop:
1. Send the model the conversation + a list of tools it can call.
2. If it replies with a tool call, run that tool and feed the result back.
3. If it replies with plain text, it's done — print it and stop.
4. Repeat.
That's the whole concept. In Python, against Ollama's /api/chat, the core is genuinely this small:
def run_agent(task):
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": task},
]
for _ in range(MAX_STEPS):
msg = chat(messages) # call Ollama
messages.append(msg)
tool_calls = msg.get("tool_calls") or []
if not tool_calls:
print(msg.get("content")) # no tool call => task done
return
for call in tool_calls:
name = call["function"]["name"]
args = call["function"]["arguments"]
result = execute_tool(name, args)
messages.append({"role": "tool", "tool_name": name, "content": str(result)})
I gave it five tools — list_dir, read_file, write_file, edit_file, and run_bash — each just a Python function with a little JSON schema so the model knows it exists. Mutating tools ask for confirmation before they run (unless I pass --auto). And that's basically it.
I pointed it at qwen2.5-coder:7b, gave it a softball — "create hello.py that prints a message, then run it" — and hit go.
Bug #1: the model wouldn't actually call the tool
Here's what came back:
{"name": "write_file", "arguments": {"path": "hello.py", "content": "print('Hello from tiny-agent')"}}
No file was created. The loop just... ended.
Look closely: that's a perfect tool call — but it came back as the message's text content, not in the structured tool_calls field my loop was checking. The model knew exactly what it wanted to do; it just described the call instead of making it. My harness saw "no tool calls," assumed the task was finished, and quit.
This is the thing everyone warns you about with smaller local models, and now I'd seen it firsthand: tool-call fidelity. A frontier model reliably uses the structured tool-calling channel. A 7B sometimes just writes the JSON into the chat like it's talking to you.
The fix is the kind of unglamorous "repair" code that real harnesses are full of: if the model didn't make a structured call, scan its text for a JSON object that names one of my tools, and run it anyway.
def recover_text_toolcalls(content):
"""Small models often emit tool calls as TEXT. Find them and run them."""
calls, dec, i = [], json.JSONDecoder(), 0
while True:
brace = content.find("{", i)
if brace == -1:
break
try:
obj, end = dec.raw_decode(content, brace)
except json.JSONDecodeError:
i = brace + 1
continue
i = end
if isinstance(obj, dict) and obj.get("name") in TOOL_FUNCS:
calls.append({"function": {"name": obj["name"],
"arguments": obj.get("arguments", {})}})
return calls
Bug #2: it called two tools at once, as text
I re-ran it. Progress — and a new failure:
{"name": "write_file", "arguments": {...}}
{"name": "run_bash", "arguments": {"command": "python3 hello.py"}}
Two tool calls, both as text, in one blob separated by a blank line. My first recovery attempt only knew how to parse a single JSON object — json.loads() on the whole thing choked on the second one and gave up. Still no file.
That's why the recovery function above doesn't use json.loads() — it uses JSONDecoder().raw_decode() in a loop, walking the text and pulling out every JSON object it finds, not just the first. One small change, but it's the difference between handling the happy path and handling what the model actually does.
Third time
(recovered 1 tool call from text)
→ write_file(path="hello.py", content="print('Hello from tiny-agent')")
(recovered 1 tool call from text)
→ run_bash(command="python3 hello.py")
File hello.py created and executed successfully. Output: Hello from tiny-agent
It read the task, wrote a real file, ran it, saw the output, and reported back — the full loop, driven by a 7B model on my laptop, no frontier model anywhere in the picture. hello.py was sitting on disk with exactly the right contents.
What I actually learned
The loop I set out to build took maybe an hour. The two bugs took longer — and they're the whole point. The gap between a toy agent and a useful one isn't the clever orchestration; it's the boring defensive code that copes with a model doing almost the right thing in not quite the right format. recover_text_toolcalls() is the single most important function in the project, and it's pure damage control.
It also reframed why Claude Code felt so slow on a local model. Claude Code is built for a model that nails the structured channel every time and reasons over a huge context. Point it at a 7B and you're asking a tool tuned for a Formula 1 car to run on a go-kart engine. My tiny harness "works" with the 7B precisely because it expects the go-kart and pads every corner.
Where it goes next
This is an MVP, not a Claude Code replacement, and I'm fine with that. The obvious next steps:
- Streaming output instead of waiting for a whole turn
- A real diff view before applying edits
- A
grep/search tool so it can work in bigger codebases - Memory across runs
If you want to poke at it, it's ~300 lines of standard-library Python — no dependencies beyond a running Ollama. Honestly, building it is the fastest way I know to stop seeing these tools as magic. They're a loop, five functions, and a surprising amount of duct tape.
Code: github.com/josenbobby/tiny-agent · Part of my local-AI series, after How to Point Claude Code at a Local LLM with Ollama.
STAY UPDATED
Get new posts on software engineering and AI in your inbox. No spam, unsubscribe anytime.