I Used an AI Agent to Build an MCP Server in 5 Minutes. Here's Exactly How.
I pointed free-code (the unguarded Claude Code fork) at my Go codebase with a spec file and walked away. 5 minutes later: compiled binary, 11 tests passing, 51.6% coverage.
I Used an AI Agent to Build an MCP Server in 5 Minutes. Here's Exactly How.
I had a Go project called vaos-kernel. JWT issuance, Ed25519 signing, hash-chained audit logs, credential management. Solid crypto primitives, well-tested, doing what it was supposed to do. But nobody could use it from their AI toolchain without writing Go.
I needed to wrap the whole thing as an MCP server — Model Context Protocol, the standard that lets Claude Code (and other agents) call external tools over stdio. One binary, expose the functions, done. Any Claude Code user could then issue credentials, verify signatures, and query audit chains without touching Go.
I did not want to write it by hand.
The builder decision
I had four agents available to me. Here's how they stacked up:
| Agent | Model | Setup Cost | Guardrails | Planning | Notes |
|---|---|---|---|---|---|
| Zoe (OpenClaw) | GPT-4o | High — Windows VM, 222-zettel KB brain | Heavy | Zettelkasten retrieval | My knowledge worker. Not a code sprinter. |
| Daemon | Claude Opus (via Elixir runtime) | Medium — 38 tools, quality gates, Elixir ecosystem | Structured | Loop sessions with fitness scoring | My own agent runtime. Overkill for a single file task. |
| Codex CLI | GPT-4o / o3 | Low | Moderate | Basic | OpenAI's agent. Decent but not the strongest model for code. |
| free-code | Claude Opus | Zero | None | ULTRAPLAN + VERIFICATION_AGENT | Unguarded Claude Code fork. No permission prompts. No refusals. |
The task was straightforward: read a spec, generate two files, make the tests pass. I didn't need Zoe's knowledge graph. I didn't need Daemon's 38-tool orchestration layer. I needed raw code generation from the strongest available model with zero friction.
Why free-code won
Three reasons.
First, the model. Claude Opus is the strongest code generation model I've tested across my agent fleet. For a task that's pure implementation — read spec, produce code, verify — model quality is the only variable that matters.
Second, zero guardrails. vaos-kernel deals with cryptographic signing, JWT tokens, Ed25519 key generation. Most guarded AI tools get nervous around security-adjacent code. They'll refuse to generate private key handling, or add warnings that slow everything down. free-code doesn't have that problem. It's the unguarded fork — it does what you tell it.
Third, ULTRAPLAN. free-code has a built-in multi-step task decomposition system. Give it a complex spec and it breaks the work into phases, executes sequentially, then runs a VERIFICATION_AGENT pass to check its own output. For a "read spec, implement, test" workflow, that's exactly the loop I wanted.
The setup
free-code's original repo got DMCA'd. I found a surviving fork on GitHub, cloned it, and built locally. It runs on Bun (the JavaScript runtime), so I installed that on my Mac Mini. Built the binary, pointed it at my z.ai proxy for API auth, and that was it. Maybe 10 minutes of setup, most of it waiting for Bun to compile.
The spec was the real work
Here's the part people skip when they talk about AI coding: the spec.
I wrote a CLAUDE.md file and dropped it in the vaos-kernel repo root. It wasn't a vague prompt. It was every function signature in the codebase. Every struct definition. The exact MCP tool names I wanted, the exact JSON schemas for inputs and outputs, the exact error handling patterns.
I spent 30 minutes reading through the existing Go code. I traced how the Dependencies struct wires everything together — how JWTService depends on KeyManager, how AuditLogger chains hashes, how CredentialStore serializes to JSON. I mapped which functions were public, which took context arguments, which returned errors I'd need to surface through MCP.
That 30 minutes of human investigation produced a spec that left almost nothing ambiguous. The tool definitions looked like this:
Tool: request_credential
Input: { agent_id: string, action: string, resource: string, description: string (optional) }
Output: { token: string, intent_fingerprint: string, attestation: string, signature: string, expires_in_seconds: 60 }
Implementation: build IntentRequest, hash with Hasher.HashIntent(), issue JWT with Issuer.Issue(agentID, fingerprint), sign with Signer.Sign()
Every tool had that level of detail. The Dependencies struct, the initialization order, the error wrapping pattern, the test structure I expected. All of it explicit.
The execution
free-code -p "Read CLAUDE.md and execute every step" --dangerously-skip-permissions
I typed that and walked away to make coffee.
The --dangerously-skip-permissions flag is what makes free-code useful for this kind of work. No "are you sure you want to write to this file?" prompts. No "this code handles cryptographic keys, are you comfortable proceeding?" interruptions. It reads the spec, plans the work, writes the code, runs the tests, and fixes what breaks. Unattended.
The result
I came back 5 minutes later. Two new files in the repo:
cmd/mcp/main.go— 12KB, 450 lines. The full MCP server implementation.cmd/mcp/main_test.go— 15KB, 525 lines. Comprehensive test suite.
I ran the tests myself to verify:
=== RUN TestServerInitialization
--- PASS: TestServerInitialization (0.00s)
=== RUN TestBootstrapDefaultAgent
--- PASS: TestBootstrapDefaultAgent (0.00s)
=== RUN TestHandleRequestCredential
--- PASS: TestHandleRequestCredential (0.00s)
=== RUN TestHandleVerifyCredential
--- PASS: TestHandleVerifyCredential (0.00s)
=== RUN TestHandleRecordAudit
--- PASS: TestHandleRecordAudit (0.00s)
=== RUN TestHandleVerifyChain
--- PASS: TestHandleVerifyChain (0.00s)
=== RUN TestHandleGetPublicKey
--- PASS: TestHandleGetPublicKey (0.00s)
=== RUN TestHandleRegisterAgent
--- PASS: TestHandleRegisterAgent (0.00s)
=== RUN TestHandleRegisterAgentDefaultType
--- PASS: TestHandleRegisterAgentDefaultType (0.00s)
=== RUN TestMainSignsKeyPersistence
--- PASS: TestMainSignsKeyPersistence (0.00s)
=== RUN TestEnvVarJWTSecret
--- PASS: TestEnvVarJWTSecret (0.00s)
PASS
coverage: 51.6% of statements
ok vaos-kernel/cmd/mcp 0.156s
11 tests. All passing. 51.6% coverage. go vet clean. Compiled binary: 8.7MB.
The server handles the full MCP lifecycle — initialize, tool listing, tool execution, proper JSON-RPC error codes. It exposes 6 tools: credential issuance, credential verification, audit recording, audit chain verification, Ed25519 public key retrieval, and agent registration.
I reviewed the generated code. It followed the Dependencies struct pattern from the existing codebase. It used the same error wrapping conventions. It initialized services in the right order. It wasn't perfect — I'd refactor a few things for production — but it was correct, and it compiled and passed tests on the first run I saw.
What I actually learned
The 5-minute number is real but misleading if you don't include context. Here's the actual time breakdown:
- 30 minutes: Reading the codebase, understanding wiring, writing the spec
- 10 minutes: Setting up free-code (one-time cost)
- 5 minutes: Agent execution
- 10 minutes: Reviewing generated code, running tests independently, verifying behavior
Total wall clock: about 55 minutes. But 30 of those were the spec, 10 were setup I'll never repeat, and 10 were review I'd do regardless.
The lesson is simple and I keep re-learning it: the spec is the work. An AI agent with the strongest model and zero friction is worthless without a precise spec. A precise spec with a mediocre agent still produces something useful. The quality flows from the spec, not the tool.
I spent 30 minutes reading every function signature in my own codebase. That's not overhead. That's the job. The agent just types faster than I do.
Try it yourself
vaos-kernel is open source. The MCP server is live. If you're running Claude Code (or any MCP-compatible client), you can point it at the compiled binary and start issuing credentials, signing data, and querying audit chains.
The spec I wrote is in the repo as CLAUDE.md. If you want to see what "good enough for an AI agent to execute unattended" looks like, that's your reference.
Build the binary, add it to your MCP config, and see what a hash-chained audit log feels like from inside your editor.