Back
What Percentage of New Content Is AI-Generated? (Ahrefs Research)
webThis Ahrefs study offers empirical data on AI-generated content proliferation online, relevant to concerns about model training data quality, epistemic integrity, and the broader societal impact of generative AI deployment.
Metadata
Importance: 42/100blog postanalysis
Summary
Ahrefs conducted research estimating the proportion of newly published web content that is AI-generated, using large-scale crawl data. The study provides empirical data on the rapid growth of AI-generated text across the internet, with implications for content quality, search ecosystems, and information integrity.
Key Points
- •Empirical analysis of web crawl data to estimate prevalence of AI-generated content in newly published pages.
- •Findings suggest a significant and growing share of new web content is AI-generated, raising concerns about content quality.
- •Has implications for the reliability of training data for future AI models (model collapse / data contamination risk).
- •Relevant to discussions about disinformation, epistemic pollution, and the changing nature of online information.
- •Provides a quantitative baseline for tracking the spread of AI-generated content over time.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Epistemic Collapse | Risk | 49.0 |
Cached Content Preview
HTTP 200Fetched Apr 9, 20267 KB
74% of New Webpages Include AI Content (Study of 900k Pages) The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds The conference for marketers ready to win in 2026 Starts in -- Days -- Hours -- Minutes -- Seconds Ryan Law Ryan Law is the Director of Content Marketing at Ahrefs. Ryan has 13 years experience as a writer, content strategist, team lead, marketing director, VP, CMO, and agency founder. He's helped dozens of companies improve their content marketing and SEO, including Google, Zapier, GoDaddy, Clearbit, and Algolia. He's also a novelist and the creator of two content marketing courses. Article Performance Data from Ahrefs Linking websites 475
Sign up for Ahrefs Get SEO metrics of any website or URL. The number of websites linking to this post.
This post's estimated monthly organic search traffic.
Get the week's best marketing content Email Subscription Subscribe
Leave this field empty if you're human: Contents We analyzed 900,000 newly created web pages in April 2025 and found that 74.2% of them contained AI-generated content. At Ahrefs, our machine learning team has built an AI content detector (codenamed bot_or_not ). We’re about to release the AI content detector for Ahrefs customers to use, so we decided to put it through its paces with a question we’ve been dying to answer:
What percentage of new content is AI-generated?
Sidenote. Thanks to our data scientist Xibeijia Guan and CMO Tim Soulo for conducting this research. We’re about to release our AI content detector as part of the Page Inspect tool in Site Explorer.
Findings
We used bot_or_not to analyze 900,000 English-language web pages that were newly detected by our web crawler in April 2025. We analyzed one page per domain (so we tested content from 900,000 different domains). Each page was categorized according to the percentage of the page our model detected as being AI-generated.
Here’s what our content detector found:
2.5% of pages were categorized as “pure AI.”
25.8% were categorized as “pure human.”
71.7% were categorized as a mix of the two.
Of those that contained a mix of AI and human content:
25.86% showed moderate AI
... (truncated, 7 KB total)Resource ID:
96a3c0270bd2e5c0 | Stable ID: sid_JvSis39LOR