Research shows humans near random chance
webCredibility Rating
High quality. Established institution or organization with editorial oversight and accountability.
Rating inherited from publication venue: ScienceDirect
Relevant to AI deployment and governance debates around academic integrity; demonstrates that current AI text detection—both human and automated—is insufficient, with implications for how AI capabilities are assessed and managed in educational settings.
Metadata
Summary
A survey experiment with 63 university lecturers found that both humans and AI detectors perform only slightly better than random chance at identifying AI-generated academic texts, with humans achieving 57% recognition for AI texts and 64% for human texts. Professional-level AI writing was correctly identified by fewer than 20% of participants, raising serious concerns about academic integrity and the reliability of current AI detection methods.
Key Points
- •Human evaluators and AI detectors both identified AI-generated text only marginally above chance (~57% for AI texts vs. 50% baseline).
- •Professional-level AI-generated texts were nearly undetectable, with less than 20% of lecturers correctly classifying them.
- •No statistically significant difference was found between human and machine detection performance.
- •Prior teaching experience slightly improved detection accuracy, but subjective text quality judgments were unaffected by actual authorship.
- •Findings suggest traditional written academic assessments are increasingly vulnerable to undetected AI use, warranting reassessment of evaluation formats.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| AI-Powered Consensus Manufacturing | Risk | 64.0 |
Cached Content Preview
Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses - ScienceDirect
JavaScript is disabled on your browser.
Please enable JavaScript to use all the features on this page.
Skip to main content Skip to article View PDF
Download full issue
Search ScienceDirect International Review of Economics Education
Volume 49 , June 2025 , 100321 Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses ☆ ☆
Author links open overlay panel Alexandra Fiedler 1 , Jörg Döpke 2 Show more Add to Mendeley Share Cite https://doi.org/10.1016/j.iree.2025.100321 Get rights and content Under a Creative Commons license Open access Highlights
• A survey of 63 lecturers revealed that only half of the AI-generated texts could be recognized as such.
• Humans recognize AI texts slightly better than AI detectors.
• The higher the level of AI-generated texts, the more difficult it is to distinguish them from human texts.
• Human assessment of text quality does not depend on whether the text is actually from an AI.
Abstract
We investigate whether human experts can identify AI-generated academic texts more accurately than current machine-based detectors. Conducted as a survey experiment at a German university of applied sciences, 63 lecturers in engineering, economics, and social sciences were asked to evaluate short excerpts (200–300 words) from both human-generated and AI-generated texts. These texts varied by discipline and writing level (student vs. professional) with the AI-generated content. The results show that both human evaluators and AI detectors correctly identified AI-generated texts only slightly better than chance, with humans achieving a recognition rate of 57 % for AI texts and 64 % for human-generated texts. There was no statistically significant difference between human and machine performance. Notably, professional-level AI texts were the most difficult to identify, with less than 20 % of respondents correctly classifying them. Regression analyses suggest that prior teaching experience slightly improves recognition accuracy, while subjective judgments of text quality were not influenced by actual or presumed authorship. These findings suggest that current written examination practices are increasingly vulnerable to undetected AI use. Both human judgment and existing AI detectors show high error rates, especially for high-quality AI-generated content. We conclude that a reconsideration of traditional assessment formats in academia is warranted. Previous article in issue
Next article in issue
Jel classification
A2 I23 C88 Keywords
Artificial intelligence Written examinations Grading Recommended articles ☆ The authors thank three anonymous referees, Uli Fritsche, Ulrich Schindler, Tim Köhler, Lars Tegtmeier, and Gabi Waldhof for heipful comments on previous versions of this paper. We also thank
... (truncated, 3 KB total)42f78f51ca2fdb71 | Stable ID: sid_1F1uPa8EUg