Common Crawl accused of feeding paywalled content to AI companies

By.

Nov 5, 2025

Is this how AI companies are getting access to paywalled journalism? A new report accuses Common Crawl of doing AI's “dirty work,” which the organization denies.

If you’ve ever wondered how AI companies like Google, Anthropic, OpenAI, and Meta get their training data from paywalled publishers such as the New York Times, Wired, or the Washington Post, we may finally have an answer.

In a detailed investigation for The<a href="https://www.theatlantic.com/technology/2025/11/common-crawl-ai-training-data/684567/" rel="noopener" …