AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test

Why did Del Monte file for bankruptcy? What impact will Trump's Vietnam trade deal have? Why is Microsoft cutting thousands of jobs? How does Trump's megabill affect Medicaid and taxes? Why are Tesla's sales falling sharply? What caused Bitcoin to surge above $110,000? How will U.S. chip software curbs lifting affect China?

AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test

pymnts.com/news/artificial-intelligence/2025/ai-agents-do-well-in-simulations-falter-in-real-world-shopkeeping-test

In a bid to test whether artificial intelligence (AI) agents can operate autonomously in the real economy, Andon Labs and Anthropic deployed Claude Sonnet 3.7 — nicknamed “Claudius” — to run an actual small, automated vending store at Anthropic’s San Francisco office for a month.

…

This story appeared on pymnts.com, 2025-07-02 23:15:57.