Back
The Perils and Promises of Fact-Checking with Large Language Models
webfrontiersin.org·frontiersin.org/journals/artificial-intelligence/articles...
Relevant to AI safety discussions around LLM reliability and deployment in high-stakes information contexts; highlights evaluation challenges and risks of over-trusting LLMs for truth verification tasks.
Metadata
Importance: 42/100journal articleprimary source
Summary
This paper evaluates LLM agents (GPT-3 and GPT-4) for automated fact-checking, finding that contextual information retrieval significantly enhances performance, but accuracy remains inconsistent across query languages and claim types. The study highlights both the promise and limitations of using LLMs to combat misinformation at scale.
Key Points
- •GPT-4 outperforms GPT-3 in fact-checking tasks, but accuracy varies significantly by query language and claim veracity.
- •LLM agents equipped with contextual retrieval (RAG-style) show markedly improved fact-checking capabilities over base models.
- •Agents explain their reasoning and cite sources, improving transparency but not eliminating inconsistent accuracy.
- •Automated fact-checking is increasingly critical as misinformation spreads faster than human fact-checkers can respond.
- •The study calls for deeper research into failure modes of LLM fact-checking agents before deployment in information ecosystems.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI-Era Epistemic Infrastructure | Approach | 59.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 202660 KB
Frontiers | The perils and promises of fact-checking with large language models
ORIGINAL RESEARCH article
Front. Artif. Intell. , 07 February 2024
Sec. Natural Language Processing
Volume 7 - 2024 | https://doi.org/10.3389/frai.2024.1341697
Published in Frontiers in Artificial Intelligence
Natural Language Processing
4.7 impact factor
7.3 citescore
Part of a Research Topic
Countering Fake News and Hate Speech on Social Media Platforms
84k views
6 articles
Editor & Reviewers
Edited by
R B Rafael Berlanga
University of Jaume I, Spain
Reviewed by
I S Ismael Sanz
University of Jaume I, Spain
K M Kevin Matthe Caramancion
Naval Postgraduate School, United States
Outline
Figures and Tables Figure 1
View in article
Figure 2
View in article
Figure 3
View in article
Figure 4
View in article
Figure 5
View in article
Figure 6
View in article
Figure 7
View in article
Table 1
Comparison of accuracy of all conditions on the PolitiFact dataset.
View in article
Table 2
Performance on the multilingual dataset without context.
View in article
Table 3
Performance on the multilingual dataset with context.
View in article
ORIGINAL RESEARCH article
Front. Artif. Intell. , 07 February 2024
Sec. Natural Language Processing
Volume 7 - 2024 | https://doi.org/10.3389/frai.2024.1341697
The perils and promises of fact-checking with large language models
D Q Dorian Quelle 1,2 † *
A B Alexandre Bovet 1,2 †
1. Department of Mathematical Modeling and Machine Learning, University of Zurich, Zurich, Switzerland
2. Digital Society Initiative, University of Zurich, Zurich, Switzerland
Article metrics
View details Abstract
Automated fact-checking, using machine learning to verify claims, has grown vital as misinformation spreads beyond human fact-checking capacity. Large language models (LLMs) like GPT-4 are increasingly trusted to write academic papers, lawsuits, and news articles and to verify information, emphasizing their role in discerning truth from falsehood and the importance of being able to verify their outputs. Understanding the capacities and limitations of LLMs in fact-checking tasks is therefore essential for ensuring the health of our information ecosystem. Here, we evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions. Importantly, in our framework, agents explain their reasoning and cite the relevant sources from the retrieved context. Our results show the enhanced prowess of LLMs when equipped with contextual information. GPT-4 outperforms GPT-3, but accuracy varies based on query language and claim veracity. While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy. Our investigation
... (truncated, 60 KB total)Resource ID:
dd6f2b62bdf62bd8 | Stable ID: sid_VFycAa3LJJ