AI Agents
What Is an Agent?
An agent is an AI system that can reason, plan, use tools, and take actions in a loop until a task is complete. Unlike a chatbot that answers one question at a time, an agent works autonomously across multiple steps.
Chatbot: User asks → Model answers → Done
Agent: User asks → Model thinks → Uses tool → Observes result
→ Thinks again → Uses another tool → Observes
→ ... (loop continues) → Delivers final result
Chatbot vs Agent
| Chatbot | Agent | |
|---|---|---|
| Interaction | Single turn Q&A | Multi-step autonomous execution |
| Tools | None (or minimal) | Multiple tools used in sequence |
| Planning | None | Breaks tasks into steps |
| Memory | Current conversation | Conversation + tool results + state |
| Autonomy | Responds to each message | Works independently toward a goal |
| Example | "What is recursion?" | "Refactor this module and fix the tests" |
The Agent Loop
Every agent follows the same fundamental loop:
┌──────────────┐
│ OBSERVE │ ← Read environment (tool results, errors, files)
└──────┬───────┘
│
┌──────▼───────┐
│ THINK │ ← Reason about what to do next
└──────┬───────┘
│
┌──────▼───────┐
│ ACT │ ← Call a tool, write code, execute command
└──────┬───────┘
│
│ Loop until task is complete
└──────────────→ back to OBSERVE
This loop continues until the agent decides the task is complete, hits a maximum iteration limit, or encounters an unrecoverable error.
The ReAct Pattern
ReAct (Reasoning + Acting) is the most common agent pattern. The model explicitly writes out its reasoning:
User: "What's the population of the capital of France?"
Thought: I need to find the capital of France first, then look up its population.
Action: search(query="capital of France")
Observation: The capital of France is Paris.
Thought: Now I know it's Paris. I need to find Paris's population.
Action: search(query="population of Paris 2024")
Observation: The population of Paris is approximately 2.1 million (city proper).
Thought: I have the answer now.
Answer: The population of Paris, the capital of France, is approximately 2.1 million.
The explicit Thought steps make the agent's reasoning transparent and debuggable.
Agent Components
| Component | Role | Analogy |
|---|---|---|
| LLM | Reasoning, planning, decision-making | Brain |
| Tools | Actions the agent can take (APIs, file ops, search) | Hands |
| Memory | Conversation history, tool results, notes | Notepad |
| Planning | Breaking tasks into steps, prioritizing | Strategy |
| Orchestrator | Controls the loop, manages state, enforces limits | Manager |
Example: Claude Code as an Agent
Claude Code is a real-world agent you use daily. Here is how it processes a request like "Fix the failing test in auth.test.ts":
1. OBSERVE → Read the test file
2. THINK → "The test expects a 200 status but the route returns 401"
3. ACT → Read the auth middleware code
4. OBSERVE → See that the token validation has a bug
5. THINK → "The token expiry check is using < instead of >"
6. ACT → Edit the middleware file
7. ACT → Run the tests
8. OBSERVE → All tests pass
9. RESPOND → "Fixed the token expiry comparison in auth middleware"
Types of Agents
1. Simple Agent (Single Loop)
One LLM running the observe-think-act loop with tools:
User → [LLM + Tools] → Result
Best for: straightforward tasks with clear steps.
2. Multi-Agent System
Multiple specialized agents collaborating:
User → Orchestrator Agent
├── Research Agent (searches, gathers info)
├── Coding Agent (writes implementation)
└── Review Agent (checks quality)
Each agent has its own system prompt, tools, and expertise. They pass results to each other.
3. Hierarchical Agent
An orchestrator delegates to worker agents:
Orchestrator: "Build a user auth system"
→ Worker 1: "Design the database schema"
→ Worker 2: "Implement the API endpoints"
→ Worker 3: "Write the tests"
→ Orchestrator: Combines and verifies results
When to Use Which
| Approach | When to Use |
|---|---|
| Simple agent | Single-domain tasks, one skill set needed |
| Multi-agent | Complex tasks spanning multiple domains |
| Hierarchical | Large projects with clear task decomposition |
Agent Frameworks
| Framework | Description | Best For |
|---|---|---|
| Claude Agent SDK | Anthropic's official SDK for building agents with Claude | Production agents with Claude |
| LangChain | Popular framework with many integrations | Prototyping, diverse tool ecosystem |
| LangGraph | Graph-based agent workflows (from LangChain team) | Complex multi-step flows with branching |
| CrewAI | Multi-agent role-based collaboration | Teams of specialized agents |
| AutoGen | Microsoft's multi-agent conversation framework | Research, conversational agents |
Claude Agent SDK — Quick Example
pythonfrom claude_agent_sdk import Agent, tool @tool def read_file(path: str) -> str: """Read contents of a file.""" with open(path) as f: return f.read() @tool def write_file(path: str, content: str) -> str: """Write content to a file.""" with open(path, "w") as f: f.write(content) return f"Wrote {len(content)} chars to {path}" agent = Agent( model="claude-sonnet-4-6-20250514", tools=[read_file, write_file], system="You are a coding assistant. Read files, make changes, verify results." ) result = agent.run("Add error handling to the parse_config function in config.py")
Challenges & Risks
| Challenge | Description | Mitigation |
|---|---|---|
| Hallucinated actions | Agent "uses" a tool incorrectly or invents data | Validate all tool inputs, verify outputs |
| Infinite loops | Agent keeps trying the same failing approach | Set max iterations (e.g., 25 steps) |
| Cost explosion | Long agent runs burn through tokens | Set token/cost budgets, use cheaper models for sub-tasks |
| Safety | Agent takes destructive actions (deletes files, sends emails) | Require human approval for dangerous operations |
| Context overflow | Long runs fill the context window | Summarize intermediate results, prune history |
| Error cascading | Early mistake compounds across steps | Build in self-verification checkpoints |
When to Use Agents vs Simple Prompting
Simple prompt: "Translate this text to French" → No agent needed
Single tool call: "What's the weather?" → Tool use, not an agent
Agent: "Research competitors and write a report" → Multi-step, needs planning
Agent: "Fix all TypeScript errors in this repo" → Iterative, needs tools
Decision Framework
| Factor | Simple Prompting | Tool Use | Full Agent |
|---|---|---|---|
| Steps involved | 1 | 1-2 | 3+ |
| Tools needed | 0 | 1 | Multiple |
| Planning required | No | No | Yes |
| Iteration needed | No | No | Yes |
| Autonomy level | None | Low | High |
Building Your First Agent — Checklist
- Define the task scope clearly (what can the agent do, what can't it?)
- Choose tools carefully (start with 3-5, not 30)
- Write excellent tool descriptions (the model reads them)
- Set iteration limits (max 10-25 steps)
- Set a cost/token budget
- Add human-in-the-loop for destructive actions
- Log every step for debugging
- Test with adversarial inputs
Resources
Previous: 15 - Tool Use & Function Calling | Next: 17 - RAG (Retrieval-Augmented Generation)