EleutherAI releases massive AI training dataset of licensed and open domain text

techcrunch.com/2025/06/06/eleutherai-releases-massive-ai-training-dataset-of-licensed-and-open-domain-text

EleutherAI, an AI research organization, has released what it claims is one of the largest collections of licensed and open-domain text for training AI models.
The dataset, called the Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging…

This story appeared on techcrunch.com, 2025-06-06 20:34:05.214000.
The Entire Business World on a Single Page. Free to Use →