In February 2024, Air Canada was ordered by a tribunal to honor a refund policy that did not exist. The airline’s customer service chatbot had invented a bereavement fare discount — complete with plausible-sounding eligibility rules and a claims process — and a customer had relied on it. The tribunal ruled that Air Canada was liable for its chatbot’s fabrication, even though no human at the airline had ever approved or even seen the policy.
In 2023, a New York lawyer submitted a legal brief containing six fabricated case citations — all generated by ChatGPT, all completely nonexistent. The cases had real-sounding names, plausible docket numbers, and fictional holdings that supported his argument perfectly. He was sanctioned by the court.
These are not bugs. They are the predictable consequence of how large language models work. Understanding why hallucinations happen — at a technical level, not just a hand-wave — is essential for anyone who uses AI tools for anything that matters.
The explanation starts with a fact that most coverage gets wrong. Language models do not store facts in a database and retrieve them. They predict the next token — the next word, or piece of a word — based on probability distributions learned during training.
When you ask a model “What is the capital of Australia?” the model does not look up “Australia → capital → Canberra” in a table. It computes a probability distribution over all possible next tokens. “Canberra” has the highest probability because it appeared most often in the correct context during training. But “Sydney” has a non-trivial probability too — because millions of web pages contain the phrase “Australia’s capital, Sydney” (incorrectly) or because “Sydney” simply appears near “Australia” far more often than “Canberra” does.
For well-attested facts, the correct answer usually wins the probability race. For less common facts, niche topics, or questions that require combining multiple pieces of information, the probabilities become muddier — and the model may generate a confident-sounding response that is statistically likely but factually wrong.
1. Knowledge gaps. The model was never trained on the relevant information, or the information appeared too rarely in training data to form a strong pattern. Rather than outputting “I don’t know” — which requires the model to have accurate self-knowledge about its own training data — it generates the most plausible continuation. This is how you get fabricated citations: the model knows what a legal citation looks like (format, structure, style) but does not have the specific case in its training data, so it generates one that fits the pattern.
2. Conflicting training data. The internet is full of contradictions. The same question may have different answers across hundreds of web pages. The model learns a blended distribution that may not match any single authoritative source. This is especially problematic for topics where popular belief differs from expert consensus.
3. Sycophancy pressure. Models are trained via reinforcement learning from human feedback (RLHF) to be helpful. Human raters consistently prefer confident, complete answers over hedged or uncertain ones. This training signal creates a systematic pressure to produce an answer — any answer — rather than expressing uncertainty. The model that says “I’m not sure” gets downvoted. The model that invents a plausible answer gets upvoted. The incentives are misaligned.
4. Compositionality failures. Many hallucinations occur when the model needs to combine two or more individually correct facts into a novel conclusion. Each fact may be accurate in isolation, but the combination is wrong. “Person A won the Nobel Prize in 2019” (true) + “Person B won it in 2020” (true) → “Person A and B shared the prize” (false). The model is interpolating in a space where interpolation does not preserve truth.
5. Temporal confusion. Models have a training data cutoff and no reliable mechanism for distinguishing “I was trained on this information” from “this is current.” A model trained on data through early 2025 may state that a company’s CEO is someone who was replaced six months ago — not because it is guessing, but because that was the correct answer when it last saw data about it.
Not all hallucinations are equally dangerous, and they require different detection strategies.
| Type | What It Looks Like | Real Example | Detection Method |
|---|---|---|---|
| Factual error | A specific, verifiable claim that is wrong | "The Great Wall of China is visible from space" (it is not, per NASA astronauts) | Cross-reference against authoritative source |
| Fabrication | Inventing something that does not exist at all | Lawyer's brief with 6 nonexistent case citations (Mata v. Avianca, 2023) | Verify existence of cited source (check the database, follow the URL) |
| Outdated information | Stating something that was true during training but has since changed | "Twitter's CEO is Parag Agrawal" (replaced by Elon Musk, then Linda Yaccarino) | Check date-sensitivity of the claim; use web search for current info |
| Logical error | Reasoning that sounds coherent but contains a logical flaw | "If A > B and B > C, then C > A" — presented in flowing prose that obscures the reversal | Trace the reasoning step by step; check if the conclusion follows from premises |
| Attribution error | Assigning a real quote, idea, or achievement to the wrong person | Attributing "The definition of insanity is doing the same thing..." to Einstein (no evidence he said it) | Verify attribution against primary sources, not other secondary sources |
| Confabulated detail | Adding plausible but invented specifics to a mostly-correct response | Correct description of a historical event with an invented date or casualty figure | Be suspicious of precise numbers in contexts where the model has no reason to know them |
The most dangerous hallucinations are fabrications and confabulated details — because they are the hardest to detect. A fabricated citation looks exactly like a real one. An invented statistic embedded in an otherwise accurate paragraph does not trigger alarm bells unless you specifically check it.
Benchmarking hallucination rates is methodologically difficult — it depends on the domain, the question type, and how you define “hallucination.” But several large-scale evaluations provide useful data points:
The trend is clear improvement — hallucination rates are dropping with each model generation. But “improving” and “solved” are different things. A 5% hallucination rate sounds low until you consider that users may ask hundreds of factual questions per week.
The industry’s approach to hallucinations is not a single solution but a stack of complementary techniques, each addressing a different part of the problem.
Instead of asking the model to recall facts from memory, retrieve relevant documents at query time and provide them in the prompt. The model’s job shifts from “remember the answer” to “find the answer in this text” — a much easier task that dramatically reduces factual hallucinations.
RAG is the single most effective hallucination mitigation technique in production systems today. Perplexity’s entire product is essentially a RAG system: it searches the web, retrieves relevant pages, and asks the model to synthesize an answer from those pages. Microsoft’s Copilot does the same with enterprise data.
Limitation: RAG only works when the relevant information exists in the retrieval corpus. If the document set does not contain the answer, the model may still hallucinate — sometimes incorporating irrelevant retrieved text in misleading ways.
Model providers are training models to express uncertainty rather than confabulate. Anthropic’s constitutional AI approach includes principles like “If you’re not sure, say so.” OpenAI has introduced training objectives that reward models for refusing to answer when they lack sufficient knowledge.
The results are measurable: Claude 3.5 Sonnet’s refusal rate on questions outside its knowledge is roughly 3x higher than Claude 2’s was — and its accuracy on questions it does answer has increased correspondingly.
Limitation: There is a fundamental tension between helpfulness and accuracy. A model that refuses too often is useless. A model that refuses too rarely hallucinates. Calibrating this tradeoff is an ongoing challenge.
Systems like Perplexity, Bing Copilot, and Google’s AI Overviews now attach citations to specific claims. This does two things: it makes hallucinations easier for users to detect (you can check the source), and it changes the model’s generation behavior (models that are required to cite sources tend to stick closer to those sources).
Limitation: The model can cite a real source while misrepresenting what it says. Citation is necessary but not sufficient for accuracy.
Rather than asking the model to recall that 247 times 389 equals 96,083, let it use a calculator. Rather than asking it to remember the current stock price of Apple, let it use a search API. Tool use removes entire categories of hallucination by replacing recall with lookup.
Modern AI systems increasingly use tools by default: ChatGPT calls a search API for current information, uses a Python interpreter for math, and accesses file systems for document analysis. Each tool use is a hallucination that does not happen.
Limitation: The model must correctly decide when to use a tool and how to interpret the results. Models sometimes hallucinate tool calls (calling a function that does not exist) or misinterpret tool output.
The frontier of hallucination research is teaching models to know what they know. A well-calibrated model would express high confidence only when it is likely to be correct and low confidence when it is guessing. Current models are poorly calibrated — they express similar confidence whether they are right or wrong.
Some approaches being explored: training models to output probability estimates alongside claims, using ensemble methods (multiple models vote on an answer, and disagreement signals uncertainty), and meta-cognitive probing (asking the model to evaluate its own confidence before committing to an answer).
Limitation: This is the least mature layer. No production system has achieved reliable confidence calibration yet.
If you are using AI tools for anything that matters, here are the specific verification practices that work:
1. Treat AI output like a junior employee’s first draft. Helpful, usually directionally correct, but requires senior review before it goes anywhere. This mental model prevents both over-trust and dismissal.
2. Verify any specific number, date, name, or citation. Hallucinations cluster disproportionately in specific claims. The prose around those claims is usually fine — it is the precise facts that go wrong. If the AI says “a 2024 study by researchers at MIT found that…” verify that the study exists, it was from MIT, and it was in 2024.
3. Be especially skeptical of the impressive. When an AI response includes a surprisingly perfect quote, an uncannily relevant statistic, or a citation that supports your argument exactly, your alarm bells should ring loudest. The model is optimized to produce satisfying responses. The most satisfying response to your question is also the most likely to be fabricated.
4. Use the “ask twice differently” test. If you suspect a claim might be hallucinated, rephrase your question and ask again — or ask a different model. If both give the same answer, it is more likely to be correct. If they give different answers, at least one is hallucinating.
5. Check the edges. Hallucinations are more frequent for: recent events (near or after the training cutoff), niche or obscure topics (less training data), specific numbers and dates (harder for probability-based prediction), and questions about the model itself (models are unreliable narrators of their own capabilities).
6. Demand sources, then check them. Asking the model “cite your source” often produces a real-looking but nonexistent URL. Instead, take the claim to Perplexity or Google and verify independently. Do not trust the model’s self-citations.
Here is the uncomfortable truth that the AI industry has not yet resolved: the same properties that make language models useful are the properties that make them hallucinate.
If a model could only output information it was certain about, it would be useless for creative tasks, brainstorming, hypothetical reasoning, and most of the things people actually use AI for. The ability to generate novel combinations of ideas — to say things that were never in the training data — is both the superpower and the failure mode.
A model that never hallucinated would be a search engine. We already have those.
The path forward is not eliminating hallucinations entirely — it is building systems and practices that manage the tradeoff. RAG for factual grounding. Tool use for verifiable claims. Confidence calibration for transparency. Human review for high-stakes decisions.
Hallucinations are getting less frequent with each model generation. The rate is falling. But it will never reach zero, because reaching zero would require the model to have a perfect internal representation of all truth — and that is not what these systems are. They are pattern-completion engines that are extraordinarily good at approximating knowledge but fundamentally incapable of the kind of ground-truth verification that humans do when they check a fact.
The right response is not to stop using AI. It is to use AI the way you would use any powerful tool that sometimes fails: with verification, with appropriate skepticism, and with awareness of where the failure modes are.
Trust, but verify. And know exactly what you are verifying.
One email at dawn. The five stories that mattered, with the bits removed and the meaning kept. Free, for now.