The Perils and Promises of Fact-Checking with Large Language Models

web

frontiersin.org·frontiersin.org/journals/artificial-intelligence/articles...

Relevant to AI safety discussions around LLM reliability and deployment in high-stakes information contexts; highlights evaluation challenges and risks of over-trusting LLMs for truth verification tasks.

Metadata

Importance: 42/100journal articleprimary source

Summary

This paper evaluates LLM agents (GPT-3 and GPT-4) for automated fact-checking, finding that contextual information retrieval significantly enhances performance, but accuracy remains inconsistent across query languages and claim types. The study highlights both the promise and limitations of using LLMs to combat misinformation at scale.

Key Points

•GPT-4 outperforms GPT-3 in fact-checking tasks, but accuracy varies significantly by query language and claim veracity.
•LLM agents equipped with contextual retrieval (RAG-style) show markedly improved fact-checking capabilities over base models.
•Agents explain their reasoning and cite sources, improving transparency but not eliminating inconsistent accuracy.
•Automated fact-checking is increasingly critical as misinformation spreads faster than human fact-checkers can respond.
•The study calls for deeper research into failure modes of LLM fact-checking agents before deployment in information ecosystems.

Cited by 1 page

Page	Type	Quality
AI-Era Epistemic Infrastructure	Approach	59.0

Cached Content Preview

HTTP 200Fetched Apr 9, 202660 KB

Frontiers | The perils and promises of fact-checking with large language models 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 ORIGINAL RESEARCH article

 Front. Artif. Intell. , 07 February 2024 

 Sec. Natural Language Processing

 Volume 7 - 2024 | https://doi.org/10.3389/frai.2024.1341697 

 Published in Frontiers in Artificial Intelligence 

 Natural Language Processing

 
 4.7 impact factor
 7.3 citescore
 Part of a Research Topic

 Countering Fake News and Hate Speech on Social Media Platforms

 84k views
 6 articles
 Editor & Reviewers

 Edited by

 R B Rafael Berlanga

 University of Jaume I, Spain

 Reviewed by

 I S Ismael Sanz

 University of Jaume I, Spain

 K M Kevin Matthe Caramancion

 Naval Postgraduate School, United States

 Outline 

 Figures and Tables Figure 1

 View in article 
 Figure 2

 View in article 
 Figure 3

 View in article 
 Figure 4

 View in article 
 Figure 5

 View in article 
 Figure 6

 View in article 
 Figure 7

 View in article 
 Table 1

 Comparison of accuracy of all conditions on the PolitiFact dataset. 

 View in article 
 Table 2

 Performance on the multilingual dataset without context. 

 View in article 
 Table 3

 Performance on the multilingual dataset with context. 

 View in article 
 ORIGINAL RESEARCH article

 Front. Artif. Intell. , 07 February 2024 

 Sec. Natural Language Processing

 Volume 7 - 2024 | https://doi.org/10.3389/frai.2024.1341697 

 The perils and promises of fact-checking with large language models 

 D Q Dorian Quelle 1,2 † * 

 
 A B Alexandre Bovet 1,2 † 

 
 
 1. Department of Mathematical Modeling and Machine Learning, University of Zurich, Zurich, Switzerland

 2. Digital Society Initiative, University of Zurich, Zurich, Switzerland

 Article metrics 

 View details Abstract

 Automated fact-checking, using machine learning to verify claims, has grown vital as misinformation spreads beyond human fact-checking capacity. Large language models (LLMs) like GPT-4 are increasingly trusted to write academic papers, lawsuits, and news articles and to verify information, emphasizing their role in discerning truth from falsehood and the importance of being able to verify their outputs. Understanding the capacities and limitations of LLMs in fact-checking tasks is therefore essential for ensuring the health of our information ecosystem. Here, we evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions. Importantly, in our framework, agents explain their reasoning and cite the relevant sources from the retrieved context. Our results show the enhanced prowess of LLMs when equipped with contextual information. GPT-4 outperforms GPT-3, but accuracy varies based on query language and claim veracity. While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy. Our investigation

... (truncated, 60 KB total)

Resource ID: dd6f2b62bdf62bd8 | Stable ID: sid_VFycAa3LJJ