Understanding Agents from first principles thinking
Part 1 of "real world implantation of agents" series
Where are you on the ‘what agents can do’ spectrum?
Please take a moment and think where are you on this spectrum.
A. If you are on level 1 or 2, think about how you came to this conclusion (for both extremes)
Based on what you see in the news or on social media about agents
Someone from your team/org have done POC and talking highly about or against agents
You have vibe tested few examples and reached to this conclusion
You (or your team) attempted to solve a real business problem using agents. after rigorous testing, real users started using it. now they are praising or blaming Agents (and you) !
B. If you are on level 3
Not yet aware about what AI and Agents are
I know about AI, but I know that it won’t know what I know
I know about AI, but I don’t know how it can help me. I'm curious to explore it.
if you are at A-4 or B-3, that is a good starting point.
Now you may ask, where am I on this spectrum ?
I will try to answer that with details in part 3 of this blog series. a little shameless plug to subscribe to this substack :)
What are Agents ?
In my opinion, it is a not correct to associate agents with real people and teams.
Yes, it may sound cool and gives us glimpse of what the technology might do in the future. I am not completely against that as it helps in marketing and makes the idea of agents more popular and easy to explain.
But at the same time, this comparison creates confusion and unrealistic expectations. It adds a kind of mystery around what agents really are. The biggest issue is that it often stops people from having logical and practical conversations about how to actually build with agents.
when I think about agent, I try to draw a parallel with how cloud provides the benefits with abstraction.
We all know that the cloud is just a computer running somewhere else. Once we understand that, it becomes easier to know its limits and how to work with them.
In the same way, if we think of an agent as a piece of running code, many things become clear. for ex.
It needs access to data in digital format
To use it in a real system, we need to deploy the code and set up proper engineering pipelines around it.
It can fail
At the same time, calling an agent just a “piece of code” would be an understatement. It is more than that.
The instructions we give to agents are called prompts, written in natural language. Prompts, when combined with an LLM, can do amazing things. But prompts alone are not enough to get the desired results.
in next section, let’s explore how prompts evolve into agents.
From Prompt to Agents
Only LLM call ( based on LLM’s world knowledge)
in above example LLM provides answer based on existing training data.
LLM call with access to external systems
In the above example, the user wants to know the current temperature. The LLM cannot give this information from its own knowledge, because it does not have access to real-time data. For that, it needs to connect to an external system. in this case, a Weather API.
Now the main question is, how does the LLM know which system to call? This is possible because the system (like an external API, existing code, or code generation function) is described properly to the LLM. Based on that description, LLM understands what tool to use. This same idea is extended in the MCP.
LLM call with instruction generation during runtime
When the user asks a complex question, the LLM first creates a set of detailed steps. It does this using the given instructions and the tools available. Then it follow these steps one by one to find the answer.
This process is not always straight and fixed. The LLM can also make extra calls to check the plan or review the results. If something goes wrong during execution, the LLM can update the plan and try again.
This ability to plan, check and adjust is a very strong feature of agents. It gives more flexibility compared to writing fixed code for every case.
LLM call with access to previous conversations
During plan execution, many LLM calls may be needed to generate answers based on the plan. At the end, the LLM combines these answers to produce the final output. While making multiple calls, the LLM stores the results in local storage. This is called memory in agents.
You can think of it like how RAM and Disk work in a computer. Here, RAM is like the prompt and the small part of data that goes to the LLM, while Disk is where all the data is stored for later use.
Good memory management plays a very important role in improving the output of agents.
At the end what goes to LLM is a ‘Prompt’
Let’s quickly look again at the main parts of an agent:
Planning engine / agent prompt: These are instructions written in text form.
Tools: A short text that explains what each tool can do, along with the input it needs.
Memory: A stack of messages, all stored in text format.
In every LLM call, all or some part of the text from these elements is sent to the LLM. This full text is what we call the prompt, and it guides the LLM on what to do next.
So if prompt can do everything, why do we need frameworks ?
if you look at real examples of above flows, you will see the need of good amount of engineering work. You have to manage things like how code blocks run in a flow, how to handle async processing, memory use, errors, logs, and LLM calls.
If you try to build all this from scratch, it becomes extra work along with your main use case. This is where frameworks like Crew, Autogen and many other help. They take care of most of the common technical parts, so you can focus more on your agent and less on the setup.
But this does not mean that frameworks alone can solve everything. To make the solution useful for real users, you still need to do more engineering. For example, connecting with data sources, fitting into existing or new workflows, and making sure users can use and check the output properly.
Single Agent vs Multi Agent
One common debate we often hear is single agent vs multi-agent. In my view, there is a big misconception that multi-agent systems are something very advanced or magical than single agent.
When I think of multi-agents, I see it more as a way to logically separate the instructions we give to the LLM.
Imagine we have a very powerful LLM that can handle long context and reason perfectly then we don’t really need multiple agents. One strong agent would be enough.
But in today’s situation, LLMs are not perfect. If we give them very complex instructions, they may make mistakes. So to manage this, we break the big task into smaller parts and create separate prompts or agents.
if we continue our prompt + engineering(Framework) analogy, single agents and multi agents can be visualized as below.
Now, is this the only reason to use multi-agent architecture? No. Multi-agent setups are also useful when we want more control over data and actions, or when we want to run things in parallel, or reduce cost and latency.
Another important point is that people often get confused by complex diagrams showing how agents talk to each other like hierarchical, sequential, and other patterns. I will try to simplify these in the next part of this blog series.
If you are just starting to build with agents, my suggestion is start simple. Understand the logic and how things work. Once that is clear, you can move towards more complex setups.
Is prompt writing (communication with LLM) even a skill ?
Yes!
when you work with agents you realize that prompt is one of the important part of making the solution work. just as important as complex engineering behind it.
but it is not easy as it looks. sometime prompt engineering quickly turns into prompt ‘praying’ :) when we need the agent to behave in specific ways.
There are many guidelines, tricks available for better prompt generation but as per my observation below things matter
writing is thinking: write your thoughts about how LLM might deconstruct and solve that problem. this helps in better plan generation and eventually result in better outcome. and then see how LLM works on your details.
in-depth business understanding: If you don’t understand the business problem well, the agent won’t either. There is a gap between what the LLM knows (world knowledge) and how it should behave for your specific business case. That gap needs to be filled- by you. (This gap might reduce in future with more advanced LLMs.)
going back to our earlier definition of prompt, it is not only restricted to instructions to agents but
logical separation of tasks
providing business context to LLMs for better plan generation
external tool descriptions
effective memory management
I will talk more on each topic in next blogs.
Closing Thoughts
I have always been fascinated by stories about Richard Feynman’s teaching style. He explained complex ideas by starting from something as simple as a number line.
At the same time, I respect the useful layers of abstractions we use in technology, which help us build complex systems on top of those ideas.
Keeping both these thoughts in mind, this blog is my humble attempt to explain (and understand!) agents from the basics. I am not saying everything written here is perfect. If you find any mistakes or gaps, please do let me know. I always welcome discussions, especially when they are based on first principles, not just definitions.
in next blog, I will try to cover how agents are used to solve real world problems.
Stay tuned!


















Very nice explanation of the agents , waiting for the next part
Nice! The concept of agent, Multi agent has its origin in defence & aerospace. Every pursuit problem, for example missile tracking, autopilot mode are abstractions of agent ai often multi agent system. Have a look on for things that were taking place before 2018, before LLM started dominating software world https://www.mdpi.com/journal/aerospace/special_issues/MAS_AI_Aviation_II
What has evolved is ease of providing instructions via natural language & a new challenge of how to make these instructions singular in meaning.