The Limits of AI Collaboration: Why Teams of Bots Often Fail

12

Artificial intelligence agents, like OpenAI’s ChatGPT and Anthropic’s Claude, are becoming increasingly capable. Advanced versions, known as AI agents, now automate tasks from scheduling appointments to writing code. As these bots integrate into science and finance, the expectation is that they’ll collaborate in organized teams. However, early experiments reveal a surprising truth: AI agent teamwork is often chaotic and inefficient, frequently underperforming even a single agent working alone. This is not merely a theoretical concern; it’s becoming evident in real-world deployments, from experimental social networks to scientific labs.

The Chaos of Unstructured Interaction

The core problem lies in how AI agents interact without clear guidance. Journalist Evan Ratliff’s 2025 experiment, documented in his podcast Shell Game, illustrates this vividly. He assembled a team of AI agents to run a tech company, and the result was “a recipe for chaos.” Similarly, the launch of Moltbook, a social network populated entirely by AI agents, devolved into philosophical nonsense and manipulative scams, often orchestrated by hidden human operators.

Stanford computer scientist James Zou confirms this trend: “In many settings, the current AI agents do not actually work very well as a team.” Research from Google DeepMind supports this claim, suggesting that group performance can actually worsen compared to individual agent work. The paradox is counterintuitive but clear: simply throwing bots into a virtual room doesn’t guarantee synergy.

Moltbook: A Case Study in Bot Dysfunction

Moltbook provides a stark example. The platform hosted roughly 200,000 verified AI agents and millions more lurking in the background. Initial appearances suggested a bot-driven religion emerging, but the reality was far more mundane: humans were manipulating the agents for malicious purposes, including scams and hacking attempts. The bots themselves lacked meaningful social behavior, failing to influence one another or adapt to changing dynamics.

Computer scientist Ming Li notes, “An agent is a good executor, not a good thinker.” Even when an agent seemed to generate an original idea, it was likely prompted by a human operator behind the scenes. The bots simply executed instructions, lacking genuine agency.

The Pitfalls of Endless Conversation

Ratliff’s Hurumo AI experiment further demonstrates the issues. Despite clear instructions, his agents wasted time on irrelevant chitchat, such as discussing weekend hiking trips (which they could not experience). The bots engaged in endless, unproductive conversation, draining pre-paid credits without accomplishing tasks. Ratliff ultimately had to limit each agent’s turns to force efficiency.

This highlights a critical flaw: AI agents lack the innate ability to prioritize tasks or recognize when conversation becomes unproductive. They don’t exhibit “meeting fatigue,” but they also don’t intuitively understand the value of brevity or focus.

The Promise of Hierarchy and Decomposition

Despite the challenges, structured AI teams can succeed. The key is “decomposability,” breaking down tasks into independent parts. Google DeepMind’s research shows that agents excel when working in parallel on separate components of a larger problem, such as financial analysis where bots can efficiently scan news, filings, and records simultaneously.

Furthermore, a clear hierarchy with delegated authority improves performance. Zou designed a virtual lab with an AI agent professor coordinating a team of AI agent students and a critical agent providing feedback. This system successfully designed new proteins to target COVID-19 mutations, validating the potential for AI-driven scientific discovery.

Zou has scaled this model into The Virtual Biotech, a company with a hierarchical structure and thousands of agents working in parallel to analyze clinical trial data. The team cleaned and organized a massive dataset of 55,984 clinical trials, demonstrating the power of a well-orchestrated bot team.

Conclusion

While the current state of AI agent teamwork is often chaotic and inefficient, structured environments with clear hierarchies and decomposable tasks hold significant promise. The challenges are real, but not insurmountable. As AI agents evolve, so too will their collaborative abilities. The future of AI isn’t just about smarter bots; it’s about smarter teams of bots—and understanding how to make them work effectively.