A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others

Why was the Congressional Budget Office hacked? How did Tesla's $1 trillion pay package gain approval? What impact will FAA's 10% flight cuts have? Why was Grand Theft Auto VI delayed again? How will Trump's deal affect weight loss drug prices? What caused October's highest layoffs in 22 years? Why is Ford considering ending the F-150 Lightning?

A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others

theatlantic.com/technology/2025/11/common-crawl-ai-training-data/684567/?gift=iWa_iB9lkw4UuiWbIbrWGQv84IP0_-K67yuVC013Fx4

This story appeared on theatlantic.com, 2025-11-04 12:55:52.474000.

The Entire Business World on a Single Page. Free to Use →