Signal Classification: Why Your Agent Framework Routes Wrong

Most agent frameworks use flat routing. Signal classification matches input complexity to reasoning depth — saving cost and improving quality.

signal-theoryroutingagent-architecturevaos

Signal Classification: Why Your Agent Framework Routes Wrong

Most agent frameworks route requests the same way: throw the input at an LLM, ask it to pick a tool, execute. This works until it doesn't. The failure mode is predictable — ambiguous inputs get misrouted, simple inputs burn expensive inference, and complex inputs get shallow treatment. The root cause is that these frameworks skip classification entirely. They route without understanding what they're routing.

This post describes the signal classification approach we use in VAOS, why it exists, and where it falls short.

The problem with flat routing

LangChain's router chains, AutoGen's conversation patterns, CrewAI's task delegation — they all share a structural flaw. Routing decisions happen at the same layer as execution. A user says "hello" and the system spins up a full LLM call to decide that "hello" is a greeting. A user submits a 2000-word research brief and the system makes the same single LLM call to figure out what to do with it.

Flat routing treats all inputs as equally complex. They are not.

Keyword matching is the other common approach. It's fast but brittle. "Summarize this document" triggers the summarization tool. "Can you summarize why this architecture fails?" triggers the same tool when it should trigger multi-step analysis. The string "summarize" is not the signal. The intent behind it is.

Signals, not strings

Claude Shannon's information theory gives us a better model. Every communication has a signal and noise. The signal carries the actual information; the noise is everything that obscures it. Agent inputs work the same way. A user request carries intent buried in natural language ambiguity, context assumptions, and channel-specific conventions.

In VAOS, every input gets decomposed into a 5-tuple before any routing decision happens:

{content, priority, channel, intent, context}

This decomposition happens before the system decides how to respond. Classification is not routing. Classification informs routing.

3-tier routing table

Once classified, signals get matched to one of three processing tiers. The tiers exist because the cost of reasoning should be proportional to the complexity of the input.

TierNameLatencyLLM CallsCost per RequestExamples
1Reflexive< 100ms0$0.00Greetings, status checks, help commands, known-answer lookups
2Analytical1-5s1$0.01-0.08Code review, data queries, summarization, single-tool tasks
3Deliberative30s-20min3-15+$0.50-2.00Research investigations, multi-source analysis, complex planning

Tier 1 (Reflexive) handles signals that have known responses. These are pattern-matched in compiled bytecode using goldrush — an Erlang event processing library that compiles match patterns to BEAM bytecode. No LLM call. No network round-trip. A greeting gets a greeting back in under 100ms. Status checks hit a local ETS table and return. These requests represent roughly 40% of traffic in production and cost nothing to serve.

Tier 2 (Analytical) is the single-inference tier. One LLM call with tool selection — the same pattern most frameworks use for everything. The difference is that by the time a signal reaches Tier 2, the system already knows the intent, has selected the appropriate model, and has pre-loaded relevant context. The LLM does not waste tokens figuring out what the user wants. It executes.

Tier 3 (Deliberative) is where the system earns its keep. Multi-step reasoning with epistemic state tracking — the system maintains explicit records of what it knows, what it has inferred, and what confidence level applies to each conclusion. A research investigation might involve 8 tool calls across 3 different agents, with intermediate results validated before proceeding. This is expensive and slow. It should be, because the problems it handles are genuinely hard.

Why classification before routing matters

Three reasons: cost, latency, and quality.

Cost. If 40% of your traffic is Tier 1 and you route everything through an LLM, you are spending $0.01-0.08 per request on inputs that need $0.00. At 100,000 requests per day, that is $1,000-8,000 per day in wasted inference. Classification costs approximately $0 for pattern-matched signals and a fraction of a cent for the 5-tuple decomposition on ambiguous inputs.

Latency. A reflexive response in 80ms versus a 3-second round-trip to an LLM API is the difference between an interface that feels instant and one that feels sluggish. Users notice. More importantly, downstream systems notice — when an agent is part of a pipeline, every millisecond at the routing layer compounds.

Quality. This is the less obvious benefit. Matching signal complexity to reasoning depth prevents two failure modes. Over-thinking: a Tier 3 deliberative process applied to "what's the current status?" produces a rambling, over-qualified answer when the user wants a number. Under-thinking: a single LLM call applied to "investigate why our deployment pipeline fails intermittently on ARM nodes" produces a shallow, often incorrect response that misses the systemic issue.

Implementation details

Classifications are cached in ETS with a SHA256 key derived from the 5-tuple and a 10-minute TTL. Repeated or similar signals skip reclassification entirely. The cache hit rate in production sits around 35%, which shaves meaningful latency off bursty traffic patterns.

Event routing uses goldrush, which compiles pattern-match specifications to BEAM bytecode at module load time. This means Tier 1 routing is a function call, not an interpretation step. Pattern updates require recompilation, which takes ~2ms and happens without dropping in-flight requests — standard OTP hot-code-loading behavior.

The classification function itself runs as a GenServer with a supervision tree. If classification fails or times out (hard cap at 200ms), the signal defaults to Tier 2. This is a deliberate choice — Tier 2 is the safest fallback because it involves a single LLM call that can self-correct.

defmodule VAOS.Signal.Classifier do
  @ttl_ms :timer.minutes(10)

  def classify(%Signal{} = signal) do
    cache_key = Signal.hash(signal)

    case :ets.lookup(:signal_cache, cache_key) do
      [{^cache_key, classification, ts}] when ts + @ttl_ms > now() ->
        {:ok, classification}
      _ ->
        classification = do_classify(signal)
        :ets.insert(:signal_cache, {cache_key, classification, now()})
        {:ok, classification}
    end
  end
end

Limitations

Classification accuracy is provider-dependent. Using GLM-4.7 via Zhipu for ambiguous input classification, we see approximately 90% accuracy. The remaining 10% are edge cases where intent is genuinely ambiguous — "tell me about the system" could be informational (Tier 2) or investigative (Tier 3) depending on context that the user did not provide. We default down (Tier 2) rather than up, because the cost of over-classifying is higher than under-classifying.

There is no automated learning from misclassifications. When the system routes wrong, a human adds a rule override. This is manual and does not scale. Building a feedback loop that adjusts classification weights from user corrections is on the roadmap but not shipped.

Channel detection is entirely rule-based. A webhook is always treated as a webhook. There is no adaptation for webhook sources that behave more like chat (Slack webhooks) versus those that behave like API calls (GitHub webhooks). This works today because the channel set is small. It will not work at 20 channels.

The classification layer itself adds approximately 50ms of overhead to every request. At 100 requests per second, this is acceptable — 50ms against a Tier 2 median of 2.5 seconds is noise. At 1 request per second (a typical chatbot), it is invisible. At 10,000 requests per second, you would need to shard the ETS table and run classification across multiple nodes. We have not hit that scale.

How this compares

LangChain router chains select a destination chain based on input. There is no signal decomposition, no tiering, no cost awareness. Every routed request hits an LLM. This is flat routing with extra abstractions.

AutoGen conversation patterns route implicitly through agent roles and conversation flow. Routing logic is embedded in the conversation structure rather than explicit. This works well for predefined multi-agent workflows but poorly for open-ended inputs where the right workflow is not known in advance.

CrewAI task delegation assigns tasks to agents based on role descriptions. Routing is role-based, not signal-based. A "researcher" agent gets research tasks. This assumes the task type is known before routing, which is exactly the problem classification solves.

Signal classification is not a universal improvement over these approaches. It adds complexity. If your system handles a narrow set of well-defined tasks, flat routing is simpler and sufficient. The 5-tuple decomposition pays off when inputs are varied, ambiguous, and arrive at volumes where cost and latency differences between tiers compound into real numbers.


VAOS is built on Elixir/OTP. The signal classification system described here runs in production. The code samples are simplified from the actual implementation.