Is this how AI companies are getting access to paywalled journalism? A new report accuses Common Crawl of doing AI's “dirty work,” which the organization denies.
If you’ve ever wondered how AI companies like Google, Anthropic, OpenAI, and Meta get their training data from paywalled publishers such as the New York Times, Wired, or the Washington Post, we may finally have an answer.
In a detailed investigation for The<a href="https://www.theatlantic.com/technology/2025/11/common-crawl-ai-training-data/684567/" rel="noopener" …
