A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others

theatlantic.com/technology/2025/11/common-crawl-ai-training-data/684567/?gift=iWa_iB9lkw4UuiWbIbrWGQv84IP0_-K67yuVC013Fx4

This story appeared on theatlantic.com, 2025-11-04 12:55:52.474000.
The Entire Business World on a Single Page. Free to Use →