Back
Vectara, "Hallucination Leaderboard" (https://github.com/vectara/hallucination-leaderboard)
webCredibility Rating
3/5
Good(3)Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.
Rating inherited from publication venue: GitHub
Relevant to AI safety discussions around reliability and trustworthiness of LLMs; hallucination is a key failure mode affecting safe deployment in high-stakes contexts.
Metadata
Importance: 62/100tool pagetool
Summary
A public leaderboard that benchmarks large language models on their tendency to hallucinate or introduce factual inconsistencies when summarizing documents. It provides a standardized evaluation framework comparing models on 'groundedness' — how faithfully they summarize source material without fabricating information. The leaderboard is regularly updated as new models are released.
Key Points
- •Evaluates LLMs on hallucination rate using a summarization task, measuring how often models introduce facts not present in the source document.
- •Uses a dataset of news articles to test whether model outputs are faithful to source content, providing a consistent cross-model benchmark.
- •Ranks major commercial and open-source models (e.g., GPT-4, Claude, Llama) by hallucination frequency, enabling direct comparison.
- •Highlights that even top-tier models hallucinate at non-trivial rates, underscoring reliability concerns for real-world deployment.
- •Serves as a practical tool for practitioners selecting models where factual accuracy and trustworthiness are critical requirements.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Large Language Models | Capability | 60.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202624 KB
GitHub - vectara/hallucination-leaderboard: Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents · GitHub
Skip to content
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.
You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
vectara
/
hallucination-leaderboard
Public
Notifications
You must be signed in to change notification settings
Fork
100
Star
3.2k
main Branches Tags Go to file Code Open more actions menu Folders and files
Name Name Last commit message Last commit date Latest commit
History
442 Commits 442 Commits img img CITATION.cff CITATION.cff LICENSE LICENSE README.md README.md View all files Repository files navigation
Hallucination Leaderboard
Public LLM leaderboard computed using Vectara's Hallucination Evaluation Model, also known as HHEM. This evaluates how often an LLM introduces hallucinations when summarizing a document. We plan to update this regularly as our model and the LLMs get updated over time.
Feel free to check out the interactive hallucination leaderboard on Hugging Face.
If you are interested in previous versions os this leaderboard:
First version based on HHEM-1.0, it is available here
Most recent version, based on the previous dataset is available here
In loving memory of Simon Mark Hughes ...
Last updated on March 20, 2026
Model
Hallucination Rate
Factual Consistency Rate
Answer Rate
Average Summary Length (Words)
antgroup/finix_s1_32b
1.8 %
98.2 %
99.5 %
172.4
openai/gpt-5.4-nano-2026-03-17
3.1 %
96.9 %
100.0 %
144.4
google/gemini-2.5-flash-lite
3.3 %
96.7 %
99.5 %
95.7
microsoft/Phi-4
3.7 %
96.3 %
80.7 %
120.9
meta-llama/Llama-3.3-70B-Instruct-Turbo
4.1 %
95.9 %
99.5 %
64.6
snowflake/snowflake-arctic-instruct
4.3 %
95.7 %
62.7 %
81.4
google/gemma-3-12b-it
4.4 %
95.6 %
97.4 %
89.7
mistralai/mistral-large-2411
4.5 %
95.5 %
99.9 %
85.0
qwen/qwen3-8b
4.8 %
95.2 %
99.9 %
83.6
amazon/nova-pro-v1:0
5.1 %
94.9 %
99.3 %
66.2
amazon/nova-2-lite-v1:0
5.1 %
94.9 %
99.6 %
94.1
mistralai/mistral-small-2501
5.1 %
94.9 %
97.9 %
98.8
ibm-granite/granite-4.0-h-small
5.2 %
94.8 %
100.0 %
107.4
ai21labs/jamba-mini-2
5.3 %
94.7 %
99.6 %
109
... (truncated, 24 KB total)Resource ID:
b44f883dc65dd0e9 | Stable ID: OWJmYTEwNj