LLMs could provide a platform for modeling collective control.
In a project presented at the same workshop at a major AI conference called ICLR, a group from MIT and Google got OpenAI’s ChatGPT and Google’s Bard to work together by discussing and debating problems. They found that the duo were more likely to converge on a correct solution to problems together than when the bots worked solo. Another recent paper from researchers at UC Berkeley and the University of Michigan showed that having one AI agent review and critique the work of another could allow the supervising bot to upgrade the other agent's code, improving its ability to use a computer's web browser.
Teams of LLMs can also be prompted to behave in surprisingly humanlike ways. A group from Google, Zhejiang University in China, and the National University of Singapore, found that assigning AI agents distinct personality traits, such as “easy-going” or “overconfident,” can fine-tune their collaborative performance, either positively or negatively.
And a recent article in The Economist rounds up several multi-agent projects, including one commissioned by the Pentagon’s Defense Advanced Research Projects Agency. In that experiment, a team of AI agents was tasked with searching for bombs hidden within a labyrinth of virtual rooms. While the multi-AI team was better at finding the imaginary bombs than a lone agent, the researchers also found that the group spontaneously developed an internal hierarchy. One agent ended up bossing the others around as they went about their mission.
AI collaboration.pdf (1.1 MB)
Existing PCT methodology can determine whether or not the agents are controlling. It may also be possible to identify control loop structures within the LLMs, something that we currently can do with living control systems only in very limited ways at low levels. OpenAI recently released a research paper showing how to reverse-engineer internal structures of the LLM.
The paper on reverse-engineering AI agents:
Scaling and evaluating sparse autoencoders