Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of covering the part with needed information, while window extension struggles with focusing on the pertinent information for solving the task. To mitigate these limitations, we propose Chain-of-Agents (CoA), a novel framework that harnesses multi-agent collaboration through natural language to enable information aggregation and context reasoning across various LLMs over long-context tasks. CoA consists of multiple worker agents who sequentially communicate to handle different segmented portions of the text, followed by a manager agent who synthesizes these contributions into a coherent final output. CoA processes the entire input by interleaving reading and reasoning, and it mitigates long context focus issues by assigning each agent a short context. We perform comprehensive evaluation of CoA on a wide range of long-context tasks in question answering, summarization, and code completion, demonstrating significant improvements by up to 10% over strong baselines of RAG, Full-Context, and multi-agent LLMs.

CoA Improvement is More Obvious When RAG Fails to Retrieve Gold Answer

Comparison on NarrativeQA. X-axis/Y-axis indicate RAG/CoA performance while each point represents a bin. The number indicates the chunk index of gold answer (ratio of number of samples in bracket), and the size of the point indicates the improvement of CoA over RAG. Each point indicates a different retrieval quality (the number is recall @ n, lower is better).

The figure displays a sample prediction from HotpotQA. To find the correct answer, RAG retrieves text chunks with high semantic similarity with the query. However, conducting multi-hop reasoning is challenging as the critical first-hop answer often lacks semantic relevance to the query. In contrast, CoA operates differently: the first agent explores related topics without knowing the query’s answer, aiding subsequent inference. The second agent, also unaware of the answer, broadens the topic scope by incorporating new information. The third agent finally discovers the answer, synthesizing information from earlier agents and new data to complete the reasoning chain. This collaborative approach highlights CoA’s ability to facilitate complex reasoning across long context tasks.

As shown in the figure, CoA can outperform the vanilla baseline by a large margin on various source lengths.

CoA Mitigates “Lost-in-the-Middle” Phenomenon

To assess the “lost-in-the-middle” effect on Vanilla and CoA models, we replicated the original study by randomly selecting 500 samples from their Natural Question dataset to create a QA dataset. The figure shows the performance of CoA and Full on Natural Questions. CoA mitigates the lost-in-the-middle issue. X-axis is the index of document with gold answer where small number indicates gold answer is closer to start.

BibTeX

@article{zhang2024chain,
  author    = {Zhang, Yusen and Sun, Ruoxi and Chen, Yanfei and Pfister, Tomas and Zhang, Rui and Arık, Sercan Ö.},
  title     = {Chain of Agents: Large Language Models Collaborating on Long-Context Tasks},
  journal   = {arXiv:2406.02818},
  year      = {2024},
}

Acknowledgment

We thank Jinsung Yoon and other colleagues in cloud AI Research team for providing helpful feedback for this paper.

Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

Chain-of-Agents is a training-free, task-agnostic, highly interpretable framework for Long Context.

Abstract

Overall Structure of Chain-of-Agent

Comparison with RAG Model

CoA Improvement is More Obvious When RAG Fails to Retrieve Gold Answer

Multi-agent Collaboration in CoA Enables Complex Reasoning over Long Context

Comparison with Long LLMs

CoA Improvement is More Obvious When Long Context Models Meet Longer Inputs

CoA Mitigates “Lost-in-the-Middle” Phenomenon

Other Results and Analysis

BibTeX

Acknowledgment