The Autonomous Research Lab Nobody Asked Me to Build
April 4, 2026/5 min read/Straughter Guthrie

The Autonomous Research Lab Nobody Asked Me to Build

Three machines on a LAN, 92K lines of Elixir, and a set of tools that accidentally became an autonomous research pipeline producing real artifacts.

research-labautonomousvaosgenesis-missionIETF

The Autonomous Research Lab Nobody Asked Me to Build

Straughter Guthrie — April 4, 2026

I did not set out to build a research lab. I set out to solve problems. The lab is what happened when the solutions started talking to each other.

There is no university behind this. No team. No grant. No degree. There are three machines on a local network, about 92,000 lines of Elixir, a Windows box with a 3090, and a set of tools that were each built for a specific job but which, taken together, do something none of them were designed to do: they conduct research autonomously.

This post is a technical walkthrough of what that looks like in practice, what it actually produced, and where it falls short.

The Stack

The system runs across three physical machines: a MacBook (local dev), a Mac Mini (daemon runtime), and a Windows workstation called Draco with an NVIDIA 3090 GPU. Nothing runs in the cloud. Everything talks over LAN.

Investigate — Adversarial Epistemic Engine

The first tool that mattered was Investigate. It is a dual-prompt evidence evaluation system that runs on the Mac Mini. You give it a research question and it spawns two adversarial threads: one argues FOR the hypothesis, one argues AGAINST. Both are forced to cite sources.

The key design choice is the evidence hierarchy. Systematic reviews and meta-analyses carry 3x weight. RCTs carry 2x. Observational studies get 1.5x. Expert opinion gets 1x. Evidence lands in one of two stores: the grounded store (empirically supported, multiple independent sources) or the belief store (plausible but weakly supported). These two stores do not mix during synthesis.

A single investigation takes 15 to 20 minutes to run. That is not a limitation — that is the time it takes to actually read, weigh, and cross-reference sources rather than summarizing abstracts.

paper2code — From PDF to Working Implementation

paper2code takes any research paper PDF and produces a citation-anchored Python implementation. Every function, every constant, every design choice traces back to a specific section number in the source paper.

The part that matters is the ambiguity audit. After generating the implementation, the system walks through every gap between what the paper specifies and what the code assumes.

Six papers in, six working implementations out. Not six summaries. Six codebases where you can grep for # Section 3.2 and find the paper passage that justifies that line.

The Daemon — 92K Lines of Runtime

The daemon is an Elixir application running on the Mac Mini — 92,000 lines of code, 3,210 tests. It handles tool orchestration, sandbox execution, and the quality layer that sits between every agent action and its output.

The quality layer implements ALCOA+ audit trails from FDA 21 CFR Part 11. Every agent decision gets logged to a decision journal. Every tool execution gets a receipt. Every knowledge query gets a ledger entry.

This sounds like overkill for a personal research stack. It is not. When you are submitting public comments to NIST and corresponding with IETF draft authors, having an auditable trail of how you arrived at your conclusions is the difference between being taken seriously and being dismissed.

Zoe — Multi-Agent Builder on Draco

Zoe lives on the Windows box and orchestrates builds. It dispatches work to whichever model is appropriate — Codex for code generation, Claude for reasoning, Gemini for long-context synthesis — through a pipeline with explicit phase gates: idea, researched, planned, implementing, in PR, done.

Two meta-agents sit above the build pipeline. Learn watches completed builds for patterns. Geneticist runs mutation-style experiments on agent configurations. A third agent, Dreamer, generates research directions autonomously based on gaps in the knowledge base.

The Knowledge Layer

Letta, running in Docker on the 3090, acts as a subconscious agent that watches sessions and accumulates cross-session context. The Zettelkasten on Draco is a structured knowledge base of 281+ interconnected notes with semantic search via embeddings.

The Zettelkasten is where investigations land after they complete. Notes link to related notes. Over time, the graph reveals clusters and gaps. The gaps feed back into Dreamer.

Denario — Publication Pipeline

Denario takes code and findings and produces academic papers — proper LaTeX with citations, methodology sections, and reproducible results.

What This Actually Produced

"Autonomous research lab" means nothing without artifacts.

vaos-kernel is a Go implementation of cryptographic agent identity based on Goswami's IETF draft-goswami-agentic-jwt-00. It benchmarks at 50,774 requests per second on commodity hardware with 0.5% overhead. It is the only independent implementation of Goswami's draft specification. Goswami confirmed this and has expressed interest in collaboration. The source code is public.

A public comment was submitted to NIST NCCoE on March 21, 2026, referencing the vaos-kernel implementation.

Six research papers were produced through the Denario pipeline. All include working code.

None of this was planned as a coherent research program. vaos-kernel was built in a single day — March 17, 2026, 9 AM to 8 PM. The NIST connection was discovered three days later at 4 AM while looking for something else entirely.

Honest Comparison: AutoResearchClaw

AutoResearchClaw, built by Prof. Huaxiu Yao's group at UNC and Stanford, is a 23-stage automated research pipeline with 10,000+ GitHub stars. It is better packaged, better documented, and more accessible than anything I have built.

AutoResearchClaw produces papers. This stack produces papers and working implementations and benchmarks and standards submissions and audit trails.

The honest assessment: AutoResearchClaw is a better product. This is a better research lab.

The Recursive Loop

This system improves itself through its own epistemic gaps. When Investigate finds a question it cannot resolve, that gap becomes a research direction. Dreamer picks it up. paper2code finds relevant literature. The daemon orchestrates a build. The output feeds back into the Zettelkasten, which surfaces new gaps.

This is structurally similar to what @ryunuck describes as RLEI — Reinforcement Learning from Epistemic Incompleteness. In RLEI, a model's own uncertainty drives the training signal. Here, the system's evidence gaps drive the research agenda. Same principle, different substrate: architecture instead of weights.

Genesis Mission

Trump's November 2025 Executive Order on the Genesis Mission calls for AI-driven autonomous laboratories: adversarial hypothesis testing, automated research-to-implementation pipelines, auditable research workflows, and self-improving systems.

I did not build this stack to satisfy an executive order. But it does.

The gap between policy aspiration and working implementation is usually measured in years and millions of dollars. Sometimes it is measured in three machines on a LAN and a person who needed to solve problems.

What Is Missing

The system is not packaged. You cannot pip install it. Reproducing it requires an Elixir runtime, a Windows box, a GPU, and patience.

Orchestration between machines is partially manual. The evidence hierarchy weights are hand-tuned and unvalidated for CS literature. The Zettelkasten has 281 notes — useful but not comprehensive.

The Point

The point is not that this is better than what funded labs build. It is that this exists at all. An autonomous research pipeline that produces real artifacts — benchmarked code, standards submissions, IETF correspondence, auditable decision trails — running on consumer hardware in a home office.

The tools were not designed as a system. They were designed as solutions. The system emerged because solutions that work tend to find each other.

Goswami wrote the recipe. I built the kitchen.

The VAOS daemon, vaos-kernel, and supporting tools are open source at vaos.sh and github.com/jmanhype/vaos-kernel.