4/30/2024

LLMs, Agents, Multi-agents… and now, Self-improving ones

It feels like we’ve blinked and suddenly we’re moving from high-capacity Large Language Models (LLMs) utilizing various prompting techniques like Chain Of Thoughts and other tools, to exploring the realms of agent and multi-agent systems. Just as we’re beginning to understand how to leverage this immense potential, we are presented with tutorials and demonstrations on “self-improving” agents, which push the boundaries even further.

Take, for example, innovations like CrewAI, AutoGen Studio v2, Devin by Cognition, or LangChain’s agents, to name just a few. They are reshaping our approach to software development. Let’s delve deeper to understand this better.

Major Agent Frameworks

Devin: The Autonomous Engineer

Cognition Labs presents Devin, an AI designed as the first fully autonomous software engineer. Devin boasts the ability to plan and execute complex engineering tasks, learn from experience, and fix errors. It comes equipped with standard developer tools within a sandboxed environment and can collaborate in real-time, adjusting to feedback and participating in design decisions.

Devin’s skills include using new technologies, building and deploying applications, debugging, and improving AI models, among others. It demonstrated superior performance on the SWE-bench test, resolving a significant percentage of real GitHub issues.

SWE-Agent: From Research to Reality

In this case, SWE-agent is a tool that integrates language models (like GPT-4) into software engineering processes to solve problems in real GitHub repositories. Here is a brief summary of its applications to software development:

Agent-Computer Interface (ACI): SWE-agent uses ACI to simplify interactions between the language model and the repository, allowing for efficient navigation, editing, and execution of code files.
Problem Solving: On the SWE-bench test set, the SWE agent successfully resolves 12.29% of the problems, demonstrating state-of-the-art performance.
Impact of ACI Design: The design of the ACI significantly influences the agent’s effectiveness; a well-tuned ACI results in better performance compared to a baseline agent without it.
Setup and Usage: The page provides instructions for setting up SWE-agent using Docker and Miniconda, and details on how to use it to generate pull requests attempting to fix GitHub issues.

Here is the GitHub repository to start testing it.

CrewAI: The Dream Team

Imagine this: assembling a dream team of AI experts, each bringing a unique superpower. That is the essence of CrewAI. It’s like having a group of brilliant people, each an expert in their field, working together seamlessly to streamline the software development process.

CrewAI stands out for its innovative framework designed to coordinate autonomous AI agents, each in their role. By focusing on simplicity and a modular design, it breaks down the complex world of AI into manageable components like agents, tools, tasks, and processes. This approach not only demystifies AI but also makes it attractive and accessible.

It provides a robust platform for engineers, offering an easy-to-develop framework, tools, and UI for building multi-agent automations locally. Whether using pre-built models or those from other providers, CrewAI fosters a community where developers can exchange resources, models, and support.

The strength of CrewAI lies in its ability to facilitate team collaboration, organizing multiple intelligent agents in a cohesive way. This system excels in tasks requiring collaborative effort, improving decision-making, creativity, and problem-solving in a way that traditional tools cannot match.

AutoGen Studio

AutoGen v2 is at the forefront of AI, especially in leveraging LLMs for complex and automated workflows. This platform facilitates the orchestration and optimization of LLM workflows, enabling the creation of innovative, efficient, and high-impact applications.

With AutoGen v2, you get customizable and conversable agents that integrate and communicate seamlessly, powered by advanced LLMs, human knowledge, or a mix of tools. This adaptability opens up endless applications, from solving complex tasks to facilitating dynamic conversation-based interactions.

AutoGen Studio introduces a user-friendly interface for this powerful framework, simplifying rapid prototyping and the management of multi-agent systems. Whether configuring an LLM provider or creating agents and skills, AutoGen Studio streamlines the process and opens new possibilities in AI development.

For innovation lovers, AutoGen Studio v2 (GitHub) represents a significant step forward, inviting collaboration and continuous growth of AI applications.

LangChain’s Agents

LangChain revolutionizes the integration of LLMs into applications through “agents”. These are not mere scripts, but intelligent entities that decide their next move based on the provided context. This approach offers more flexibility and intuition in developing complex AI-based applications compared to traditional methods.

LangChain offers tools and frameworks like LangGraph, which improve agent loop control, state tracking, and human-in-the-loop responses. This flexibility allows developers to create agents that can autonomously draft content or require approval, keeping them in command.

LangChain, which supports various types of agents and offers a comprehensive library of tools, allows developers to select the optimal cognitive architecture for their applications, ensuring accurate results.

For developers looking to move quickly from prototype to production with reliable GenAI applications, LangChain and its tools, like LangSmith, provide a solid foundation, offering traceability and explainability throughout the entire development process.

The Era of Self-Improving Agents

Combining the forces of CrewAI, AutoGen Studio v2, and LangChain’s agents is not just simplifying: it is revolutionizing software development. This synergy promises unprecedented efficiency, creativity, and flexibility, expanding the horizons of AI in software creation.

And we have only just begun. The progression towards agents that improve themselves, as shown in AutoGen Studio, opens up a range of new possibilities. These agents learn on the fly, share knowledge, and evolve, transforming intimidating tasks into achievable and more efficient processes.

Unlike traditional agents, these marvels of self-improvement learn and adapt dynamically without the need for direct human coding, increasing their effectiveness and scalability.

The next generation of agents fosters collaborative learning, echoing the dynamics of human teams and giving rise to more inventive problem-solving strategies.

Here is a preview from David Ondrej on his channel.

However, this powerful technology brings with it the need for responsible use. As these agents are integrated into various sectors, ethical considerations and efficacy controls are crucial.

With the advancement of self-improving agents comes the need to consider ethical considerations and optimize efficiency. These agents have the potential to surpass their human counterparts in terms of learning and development speed, raising questions about job displacement, security, and control. Furthermore, their efficiency in task completion and problem-solving could lead to unprecedented productivity gains, but would also demand new frameworks for quality control and accountability.

A Few Words from Andrew Ng…

At this point, it is worth watching this keynote by Andrew Ng (if you haven’t seen his courses at deeplearning.ai you must) for Sequoia Capital (check out their report on Generative AI) on “What’s next for agentic reasoning”. In summary, the focus was on the transformative potential of agent-based workflows within AI.

These innovative workflows represent a significant shift from traditional linear approaches, offering a more iterative and dynamic process that mirrors human cognitive strategies. By adopting this method, AI models are capable of planning, drafting, reviewing, and reflecting, thereby achieving notably better results.

Ng illustrated this with compelling examples, including a case study showing how integrating agentic workflows with AI models like GPT-3.5 can outperform more advanced models like GPT-4 on specific tasks. This iterative improvement, facilitated by the agent’s ability to critique and refine its results, underscores the enormous potential of agent workflows to elevate AI’s problem-solving capabilities.

Furthermore, Ng’s speech delved into the emergence of distinct design patterns in AI agents, which include reflection, planning, multi-agent collaboration, and the use of external tools. These patterns are crucial for the development of robust, efficient, and versatile AI systems capable of executing complex tasks with greater autonomy.

It is worth noting the importance of rapid token generation in agentic workflows, which, according to Ng, could revolutionize the creation of AI applications by prioritizing speed over accuracy, allowing for faster iterative cycles. This perspective not only sheds light on the current state and advances of AI but also paves the way for future innovations. As we stand on the brink of these technological leaps, Ng’s ideas offer a compelling vision of AI’s ability to expand its horizons, bringing us closer to the realization of artificial general intelligence (AGI).

Industry Implementation: Turing Bots (Forrester)

What is clear is that the integration of AI agents into the Software Development Life Cycle (SDLC) is revolutionizing the way we approach project management and execution. Diego Lo Giudice, from Forrester, for example, published a series of articles some time ago on their Turing Bots, which make intensive use of agents in the various stages of development, whether generating code or any of the other tasks, such as requirements capture.

Companies like G-Research are leading the way, showing notable productivity increases by incorporating generative AI assistants (or TuringBots) into their development processes. This innovative approach not only streamlines workflows but also fosters a culture of continuous improvement and learning.

To demystify the complex operation of these agents for a broader audience, incorporating interactive demonstrations showing their capabilities in real-time can be very effective. For example, a visual simulation of CrewAI coordinating a multi-agent project or an interactive walkthrough of Devin overcoming a coding challenge can provide tangible information about their operation and benefits. Engaging with these technologies through hands-on experiences not only improves understanding but also stimulates curiosity about the future of AI.

Next Steps

Looking ahead, the integration of self-improving agents promises not only to redefine the software development landscape but also to trigger transformative changes across all sectors. For companies eager to leverage this potential, the path forward involves a strategic approach to adoption: starting with pilot projects to evaluate compatibility and effectiveness, followed by a phased integration that allows for continuous learning and adjustment. This roadmap ensures that organizations can reap the benefits of self-improving agents without losing agility in the face of evolving technological and ethical considerations.

In any case, we are getting closer to the maxim that began to be glimpsed some time ago about our new role as managers of human-in-the-loop agents:

We will only be as good as the network of agents we can manage.