Parcha's Guide to AI

Deep Dive Part 2: How does BabyAGI actually work?

The second deep dive into autonomous AI agents

AJ Asver

Apr 16, 2023 • 14 min read

Hi Hitchhikers,

Last week, we explored the fascinating world of autonomous AI agents and how they could be the next step toward Artificial General Intelligence (AGI). One of the standout examples of such agents is BabyAGI—an AI system that autonomously manages tasks by creating, prioritizing, and executing them based on a predefined objective and the outcomes of previous tasks. In case you missed it, here’s how I described BabyAGI last week1:

"My favorite example of this to date is BabyAGI, an agent created by Yohei Nakajima, a Venture Capitalist, that uses GPT-4 and Pinecone for memory to create and perform tasks autonomously! ... In summary, BabyAGI is an AI system that autonomously manages tasks by creating, prioritizing, and executing them based on a predefined objective and the outcomes of previous tasks. It interacts with GPT-4 for natural language processing and uses Pinecone for data storage and retrieval."

Today, we will take a closer look at BabyAGI and go deeper into how it works. We'll explore the inner workings of its code and understand the techniques used by BabyAGI to achieve autonomy in task management. Let's dive in!

Ready to jump in?

Before we get started, please help me on my quest to reach 5,000 subscribers by sharing this newsletter with a friend!

How BabyAGI Actually Works

Before we dive into the code, let's provide a quick refresher on what BabyAGI is and how it works. BabyAGI is an autonomous agent system that uses large language models (LLMs)2 to carry out tasks based on a defined objective. The core idea is to have a series of agents that can work together to execute tasks, create new tasks based on the results of the execution, and prioritize tasks in a list. Each of these agents is powered by an LLM, making them capable of natural language understanding and generating meaningful responses to prompts.

The high-level idea of how BabyAGI works can be summarized as follows:

The User provides an Objective and an Initial Task. The Objective is the overall goal that the agents will work towards, and the Initial Task is the first task to be executed in order to achieve the Objective.
The system starts a loop where it continuously checks for tasks in the task list. If the task list has items, it will proceed with the following steps: a. Execution Agent: The first task in the list is sent to the Execution Agent. The agent is prompted with the task and generates a response as the result of the task execution. The response is saved for future context. b. Task Creation Agent: The Task Creation Agent is prompted with the last result and remaining tasks. It generates new tasks that should be added to the task list based on the result of the previous execution. c. Prioritization Agent: The Prioritization Agent is given the task list along with the knowledge of the Objective. It is responsible for cleaning the formatting and reprioritizing the tasks in the list.
The loop continues until all tasks are completed or the Objective is achieved.

Now that we have a high-level idea of how BabyAGI works. Let’s jump into the details…

Understanding BabyAGI's Core Components

The code for BabyAGI is written in Python and the crux of it is all in one script with 313 lines of code 3. The script consists of several key components that work together to achieve autonomy:

Task List: task_list is a deque (double-ended queue) that holds the tasks that need to be completed. Each task is represented as a dictionary with a "task_id" and a "task_name." BabyAGI starts with an initial task and continuously adds new tasks to this list.
Task Creation Agent: task_creation_agent is a function that uses GPT-4 to generate new tasks based on the result of the previously completed task and a predefined objective. The function sends a prompt to OpenAI's API and receives a list of new tasks as strings.
Execution Agent: execution_agent is responsible for performing the tasks in the task list. It leverages GPT-4 and relevant context retrieved from Pinecone4 to complete each task based on the objective. The context is retrieved using context_agent to find relevant previous tasks.
Prioritization Agent: prioritization_agent reprioritizes the task list based on the objective and other factors. It sends a prompt to OpenAI's API and receives the reprioritized task list as a numbered list.
Pinecone: Pinecone is a vector database used to store and retrieve task results for context. BabyAGI uses Pinecone to create an index based on a specified table name and then stores the results of tasks, along with task names and metadata.

Infinite Loop and Workflow

The script operates in an infinite loop, and in each iteration, it performs the following steps:

Pull the First Task: BabyAGI dequeues the first task from the task list.
Execute the Task: BabyAGI sends the task to the execution agent, which uses OpenAI's API to complete the task based on the context retrieved from Pinecone.
Store the Result: The result is enriched and stored in Pinecone for future context retrieval.
Create New Tasks: The task creation agent function generates new tasks based on the objective and the result of the previous task. The new tasks are added to the task list.
Reprioritize the Task List: The prioritization agent function reprioritizes the task list based on the objective and other factors.

This process repeats continuously, allowing BabyAGI to autonomously manage tasks by creating, prioritizing, and executing them based on a predefined objective and the outcomes of previous tasks.

The Magic of Prompt Engineering

Prompt engineering5 plays a crucial role in BabyAGI's ability to generate meaningful tasks and responses. By carefully crafting the prompts sent to GPT-4, BabyAGI is able to communicate its objective, the context of previous tasks, and the specific task it needs to perform.k

For instance, in the task_creation_agent function, the prompt is designed to provide GPT-4 with the context it needs to generate new tasks:

prompt = f""" You are a task creation AI that uses the result of an execution agent to create new tasks with the following objective: {objective}, The last completed task has the result: {result}. This result was based on this task description: {task_description}. These are incomplete tasks: {', '.join(task_list)}. Based on the result, create new tasks to be completed by the AI system that do not overlap with incomplete tasks. Return the tasks as an array."""

The prompt clearly defines the AI's role and objective, provides the result of the previous task, and specifies the task description and the list of incomplete tasks. By carefully framing this information, BabyAGI enables GPT-4 to generate meaningful and relevant new tasks that align with the overall objective.

Similarly, the execution_agent function uses a prompt that includes both the objective and the task that needs to be performed:

prompt = f""" You are an AI who performs one task based on the following objective: {objective}\n. Take into account these previously completed tasks: {context}\n. Your task: {task}\nResponse:"""

Again, by providing the objective, the context of previous tasks, and the specific task, BabyAGI can guide GPT-4 toward generating a response that meets the desired criteria.

The prioritization_agent function uses a prompt that requests GPT-4 to reprioritize the task list based on the objective:

prompt = f""" You are a task prioritization AI tasked with cleaning the formatting of and reprioritizing the following tasks: {task_names}. Consider the ultimate objective of your team:{OBJECTIVE}. Do not remove any tasks. Return the result as a numbered list, like: #. First task #. Second task Start the task list with number {next_task_id}."""

This prompt ensures that GPT-4 reprioritizes tasks in a way that aligns with the objective and does not remove any tasks from the list.

The Art of Autonomy

BabyAGI's ability to manage tasks autonomously in a continuous loop makes it so intriguing. It's not just about completing individual tasks; it's about how BabyAGI dynamically generates new tasks based on the outcomes of previous ones, reprioritizes them, and iterates on this process indefinitely.

This level of autonomy is particularly exciting when we consider its potential applications in various domains, such as project management, customer support, and research. Imagine an AI system that can autonomously manage complex projects, create new action items based on progress, and prioritize tasks to meet specific goals.

As I mentioned last week, the implications of such autonomous AI agents extend beyond productivity and efficiency—they could represent a step toward AGI, where AI systems can understand and interact with the world similarly to human intelligence.

BabyAGI is growing up quickly

Since BabyAGI was released just 18 weeks ago by Yohei Nakajima, the AI community has been buzzing with excitement, and developers have been inspired to create their own projects based on the BabyAGI framework. BabyAGI’s creator Yohei Nakajima recently highlighted some of the most exciting projects that have emerged in just the last few weeks, showcasing the versatility and potential of autonomous agents:

Congoys: A web-based agent named "Cognosys" that serves as a version of AutoGPT/BabyAGI, accessible through a browser.
GOD mode: An agent that can order coffee at Starbucks, perform market analysis, find and negotiate leases, and more. It was inspired by AutoGPT and BabyAGI and is available at godmode.space.
LLMitlessAPI: A single endpoint that triggers an agent, allowing users to define the service they want the agent to perform and the data to act upon. It can be used for chat and other applications.
Baby AGI as a service: A tool that allows integration of BabyAGI with user applications using langchain-serve.
Baby AGI on Streamlit: A Streamlit web app that makes BabyAGI more accessible to user.
Otto: A system that uses LLMs and vector databases for self-governing digital organizations, enabling equal agent participation in management.
Toddler AGI: An architecture based on BabyAGI that allows multiple instances to work on the same database and use human reinforcement to prioritize important problems.
Do Anything Machine: A to-do list that autonomously completes tasks using a GPT-4 agent. The agent has context about the user and their company and can access relevant apps.
GPT-Legion: An autonomous agent framework similar to BabyAGI but written in TypeScript.
BabyAGI-asi: A modified BabyAGI that runs arbitrary Python code to perform tasks such as routines for pyautogui, web scraping, and computer actions.
AgentGPT: An agent that performs tasks directly in the browser. Users can give the agent a goal, and it will think, plan, and execute actions accordingly.
Baby AGI as a ChatGPT plug-in: A version of BabyAGI that runs as a plug-in within ChatGPT, allowing the agent to create content like a 250-page sci-fi novel.
Coding agent: An agent that follows the Test-Driven Development (TDD) methodology, allowing users to write tests while the agent iteratively develops the featur.
BabyAGI + Langchain Tools: An agent that uses Langchain tools to look up real information and take actions based on the context.
Baby AGI in Slack: An agent that uses Pinecone and Slack to create threads with new objectives and perform tasks autonomously.
Teenage-AGI: BabyAGI but with more memory capabilities.

Unfortunately, Twitter no longer allows Tweets to be embedded in Subtack posts so I encourage readers to check out the original Twitter thread and support these projects here: Yohei on Twitter: "It’s been 18 days since sharing the “paper” on BabyAGI… The most exciting to me is what people have started building since then, and the discussions around it! Check them out👇" / Twitter6

Throwing the baby out with the bathwater

As we continue to witness rapid advances in AI research and development, systems like BabyAGI remind us of the limitless possibilities of AI and its potential to shape the future in profound ways. At the same time, with the AI space moving so quickly, there’s a legitimate fear from members of the tech community that tools like BabyAGI could be used for more harm than good. The prominent VC and All In podcast host Chamath Palihapitiya is even calling for the government to step in immediately with regulations:

If you’re interested in hearing more of the arguments for/against regulation in AI, I recommend listening to the latest episode of All In. In the episode, Besties talked about tools like BabyAGI and how they could be used in both positive and dangerous ways:

As I’ve said in previous posts, I really don’t think we should act too quickly to restrict innovation in this space at this early stage. Furthermore, I think it’s entirely unrealistic for regulate language models because there will be open-source versions in the next year that will perform just as well as OpenAI’s LLMs. Regulating open-source code is exceptionally challenging because it can easily be shared, copied, and changed, and there’s no central actor that regulators can hold accountable.

I think a better path forward for regulations is to make the law clearer around the responsibility of individuals that use these tools to do damage. For example, if an individual instructs an AI Agent to cause harm to others, they should be made responsible for the harm caused. Now, the argument that you can make here is “What if the AI autonomously decides to cause harm itself”. We’re still far away from that being the case without any human instruction or objective setting. There hasn’t yet been an example of any LLMs deciding without any provocation to wish harm on humans that I’ve seen.

AI agents like BabyAGI are good at carrying out human instructions and objectives. What makes them novel over traditional computer programs is that they are able to do this using a natural language interface and they can figure out how to solve problems without step-by-step instructions. This, to me, is what makes them such an exciting technology and I can’t wait to see how they evolve in the coming weeks!

That’s all for this week. If you enjoyed my writing, then please support The Hitchhiker’s Guide to AI by subscribing - it’s completely free!

Thanks for reading The Hitchhiker's Guide to AI! Subscribe for free to receive new posts and support my work.

Deep Dive: Could autonomous AI Agents be the next step towards AGI?
Hi Hitchhikers, A huge welcome to the 350 new subscribers that hitched a ride with us this week. We’re almost at 3,000 Hitchhikers! Every so often I like to depart from my usual highlights of the week to go deeper into a development in AI that deserves more attention. This week I’m doing a deep dive into autonomous AI Agents and how they could be the nex…
The Hitchhiker's Guide to AI
A large-scale language model (LLM) is a type of deep learning model that is trained on a large dataset of text (e.g. all of the internet). LLMs predict the next sequence of text as output based on the text that they are given as input. They are used for a wide variety of tasks, such as language translation, text summarization, and generating conversational text. Open AI’s GPT-3 (General Pre-trained Transformer 3), the language model that powers Chat-GPT, is an example of a generative LLM that uses the Transformer architecture, enabling it to be trained on a massive text dataset of hundreds of gigabytes using 175 Billion parameters (weights assignments).
Learn more about LLMs in my deep dive into deep learning part 3. ↩
The original BabyAGI script was only 180 lines of code but in just the last week, many improvements have been made, including adding memory via a vector database. BabyAGI is open-source, and you can view all code in Github here: https://github.com/yoheinakajima/babyag ↩
Pinecone is an example of a vector database, and it provides a cloud-based service that allows users to store, search, and manage vectors efficiently. It's built specifically to handle the challenges of working with large-scale vector data.
A vector database is a type of database that is designed to store and manage high-dimensional vectors. In this context, a vector is a mathematical representation of data, often in the form of a list of numbers. Vectors can represent a wide variety of data, such as images, audio, text, and more. In many cases, these vectors are generated by machine learning models that convert raw data into a fixed-size numerical representation.
Now, let's see why vector databases are useful for knowledge retrieval for large language models (LLMs):
1. Efficient Similarity Search: Vector databases like Pinecone can quickly find vectors that are similar to a given query vector. This is called similarity search, and it's important for knowledge retrieval because it allows the LLM to find relevant information that is similar in meaning to the input query.
2. Scalability: As the amount of data grows, it becomes more challenging to manage and search through it. Vector databases are designed to handle large amounts of vector data, making them well-suited for LLMs that deal with extensive knowledge bases.
3. Semantic Understanding: Vectors can capture the semantic meaning of text or other data. By using vectors to represent knowledge, LLMs can go beyond simple keyword matching and understand the underlying meaning of queries and information.
4. Flexibility: Vector databases can handle a wide range of data types, not just text. This means that LLMs can use vector databases to integrate and retrieve knowledge from diverse sources, including images, audio, and more.
5. Prompt engineering is a technique used in natural language processing (NLP) to guide artificial intelligence (AI) models, such as language models, to produce specific and desired responses. Think of it as a way to communicate with the AI by providing it with carefully crafted instructions or questions. These instructions, known as "prompts," are designed to help the AI understand what information you are seeking or what task you want it to perform.
  To explain it in simpler terms, let's use an analogy:
  Imagine you have a very knowledgeable friend who is an expert in many subjects. You can ask your friend questions, and they will provide you with answers. However, your friend is also very literal, so how you ask the question matters a lot. If you ask a vague question, you might not get the answer you were looking for. On the other hand, if you ask a more specific and well-formulated question, you are more likely to get the information you need.
  Prompt engineering is similar to asking the right questions to your knowledgeable friend, except that the friend is an AI model. By carefully crafting the prompts or questions, you can guide the AI to generate responses that are useful, accurate, and relevant to your needs.
  For example, instead of simply asking the AI, "Tell me about cats," which could lead to a broad range of responses, you could use prompt engineering to ask a more specific question like, "What are the common characteristics of domestic cats?" By doing so, you help the AI understand what you are looking for and increase the chances of getting the information you need. ↩
6. There’s Yohei Nakajima’s original twitter thread on BabyAGI projects:
  You can also now see an updated list of all the BabyAGI-inspired projects on Github too: https://t.co/ALo5iSF7Be ↩
In summary, vector databases like Pinecone provide a way to store, search, and manage high-dimensional vectors efficiently. This capability is essential for LLMs because it allows them to retrieve semantically relevant knowledge to a given query, even when dealing with vast and diverse datasets. ↩
Prompt engineering is a technique used in natural language processing (NLP) to guide artificial intelligence (AI) models, such as language models, to produce specific and desired responses. Think of it as a way to communicate with the AI by providing it with carefully crafted instructions or questions. These instructions, known as "prompts," are designed to help the AI understand what information you are seeking or what task you want it to perform.
To explain it in simpler terms, let's use an analogy:
Imagine you have a very knowledgeable friend who is an expert in many subjects. You can ask your friend questions, and they will provide you with answers. However, your friend is also very literal, so how you ask the question matters a lot. If you ask a vague question, you might not get the answer you were looking for. On the other hand, if you ask a more specific and well-formulated question, you are more likely to get the information you need.
Prompt engineering is similar to asking the right questions to your knowledgeable friend, except that the friend is an AI model. By carefully crafting the prompts or questions, you can guide the AI to generate responses that are useful, accurate, and relevant to your needs.
For example, instead of simply asking the AI, "Tell me about cats," which could lead to a broad range of responses, you could use prompt engineering to ask a more specific question like, "What are the common characteristics of domestic cats?" By doing so, you help the AI understand what you are looking for and increase the chances of getting the information you need. ↩
There’s Yohei Nakajima’s original twitter thread on BabyAGI projects:
You can also now see an updated list of all the BabyAGI-inspired projects on Github too: https://t.co/ALo5iSF7Be ↩