How to Stop Your AI Agent From Hallucinating Features You Don't Have

Your chatbot promises features that don't exist. Fine-tuning won't fix it. Here's what works: correction rules injected as structured context.

ai-agentshallucinationcorrections

A user asks your customer support bot "do you support WhatsApp?" Your bot says "Yes! You can connect WhatsApp in the settings page." You don't support WhatsApp. There is no settings page.

This happens constantly with LLM-based agents. The model wants to be helpful. It fills in gaps with plausible-sounding answers. In a chatbot context, "plausible-sounding" means promising features that don't exist.

Why fine-tuning doesn't fix this

The instinct is to fine-tune: train the model on your actual product docs so it knows what's real. Three problems:

  • Fine-tuning is expensive and slow. Every product update means retraining.
  • The model can still hallucinate outside the training data. Fine-tuning reduces frequency, it doesn't eliminate it.
  • You can't inspect what the model "learned." It's a black box.
  • What works: a "cannot claim" list

    Keep a list of things your agent is NOT allowed to say. Not what it should say — what it must never claim.

    Examples:

  • "Never say WhatsApp is supported"
  • "Never mention a settings page"
  • "Never claim we offer a free plan" (if you don't)
  • "Never say data is encrypted at rest" (if it isn't)
  • This list gets injected into the system prompt at boot. Every time the agent starts, it knows what's off-limits. When you catch a new hallucination, add it to the list. The list grows over time and the hallucinations shrink.

    Automated correction capture

    The manual approach: you read transcripts, spot hallucinations, and add rules. This works for 10 conversations. It doesn't work for 1,000.

    The automated approach: flag low-confidence responses for human review. When a human corrects one, the correction is automatically stored as a rule. Next boot, the agent has the new rule.

    This is the approach we built into VAOS. Confidence scoring tags each response with a 0-1 score. Below the threshold, it gets queued. Above, it auto-approves. The threshold is adjustable per use case — a social media bot can tolerate more uncertainty than a medical information bot.

    The compound effect

    Each correction makes the next conversation slightly better. After a few weeks, your agent stops making the same mistakes. The first version of our test agent (Scribe) was embarrassingly bad. After about 80 corrections, it became reliable for its use case.

    The corrections are just data. JSON objects with a rule, a reason, and a timestamp. You can export them, version them, bring them to another provider. No lock-in.

    Try it

    If you're running an AI agent that hallucinates features, try VAOS. It traces every conversation, flags uncertain responses, and turns your corrections into permanent rules. 14-day free trial at vaos.sh.

    Or build the correction system yourself. The pattern is simple: catch the mistake, store the rule, inject at boot, repeat. The tool matters less than the loop.