An agent is an LLM that, given a goal, decides its own steps — choosing actions (tools), observing results, and iterating until done — rather than producing a single fixed response. RAG ([[06_rag]]) is a one-shot tool use; agents generalize this into autonomous, multi-step loops built on the orchestration of [[07_langchain]]/[[08_langgraph]].
9.1 LLM vs Agent
| Plain LLM call | Agent |
|---|---|
| input → output, one shot | goal → plan → act → observe → repeat → result |
| stateless | maintains state/memory across steps |
| no external actions | calls tools, APIs, code, other agents |
| knowledge frozen | fetches fresh info, acts on the world |
The agent = LLM (reasoning) + tools (acting) + loop (control) + memory (state).
9.2 The ReAct pattern (Reason + Act)
The foundational agent loop. The model interleaves reasoning traces with actions:
codeThought: I need the population of France and Germany, then their ratio. Action: search("population of France") Observation: 68 million Thought: Now Germany. Action: search("population of Germany") Observation: 84 million Thought: Ratio = 68/84 ≈ 0.81. Action: finish("≈ 0.81")
Why it works: writing the reasoning ("Thought") conditions the next action on an explicit plan, and "Observation" grounds each step in real tool output — reducing hallucination and enabling multi-hop tasks. This is exactly the cycle implemented as a graph in [[08_langgraph]] §8.4.
Prompt structure that elicits it:
codeYou can use tools: {tool_descriptions}. Use this format: Thought: <reasoning> Action: <tool>(<args>) Observation: <result> ... (repeat) ... Thought: I now know the answer. Final Answer: <answer>
9.3 The anatomy of an agent
code┌──────────────────────────────────────────┐ GOAL ──► │ PLANNER / REASONER (LLM) │ │ ↑ memory ↓ chosen action │ │ MEMORY ◄── observations TOOL EXECUTOR │──► world (APIs, code, search) └──────────────────────────────────────────┘ loop until goal met / budget hit
1. Reasoning / Planning
- ReAct: plan one step at a time (reactive). Robust, simple.
- Plan-and-Execute: make a full multi-step plan upfront, then execute (optionally re-plan). Fewer LLM calls, better for long tasks; brittle if the world changes.
- Tree of Thoughts / self-consistency: explore multiple reasoning branches and vote/search — more compute for harder problems.
- Reflection / self-critique (Reflexion): after acting, the agent critiques its own output and retries — improves quality on coding/reasoning.
2. Tools (the agent's hands)
Anything callable with a described schema: web search, code interpreter, calculator, SQL/DB, file I/O, HTTP APIs, RAG retriever, even other agents. Tool descriptions (name, purpose, args) are how the LLM decides what to use — write them precisely (see [[07_langchain]] §7.6). Modern models do native tool/function calling: they output a structured {name, args} the runtime executes.
3. Memory
- Short-term (working): the running message history / scratchpad within a task (the context window).
- Long-term: persisted facts across sessions, usually in a vector store (semantic memory) — the agent retrieves relevant past info with RAG. Also episodic (past interactions) and procedural (learned skills/instructions).
- Memory management = summarize/trim to fit context, and store/retrieve from external stores. ([[08_langgraph]] checkpointers give thread-scoped memory.)
4. Control loop / orchestration
Decides when to call tools, when to stop, how to handle errors, step/cost budgets, retries. This is what LangGraph formalizes.
9.4 Tool calling under the hood
python# 1. Define tool with schema (name, description, typed args) # 2. Pass tool schemas to the LLM # 3. LLM returns: {"tool": "search", "args": {"query": "..."}} (structured) # 4. Runtime executes the actual function # 5. Feed the result back as an observation # 6. LLM decides: call another tool, or give the final answer
The LLM never executes anything itself — it only emits a request; your code runs it and returns the result. This separation is the safety boundary.
python# pseudo-loop (mirrors 07_langchain §7.6) messages = [system_prompt, user_goal] for step in range(MAX_STEPS): # budget guard ai = llm_with_tools.invoke(messages) messages.append(ai) if not ai.tool_calls: return ai.content # final answer for call in ai.tool_calls: result = run_tool(call) # your sandboxed executor messages.append(ToolMessage(result, tool_call_id=call["id"]))
9.5 Multi-agent systems
Split a hard problem across specialized agents (each with focused tools, prompts, and possibly different models). Patterns (also in [[08_langgraph]] §8.7):
- Supervisor / orchestrator-worker: a manager agent routes subtasks to specialists (researcher, coder, reviewer) and synthesizes results.
- Pipeline: agents in sequence (e.g. outline → draft → edit).
- Debate / collaboration: agents critique each other to improve answers.
- Hierarchical teams: supervisors of supervisors for big workflows.
Trade-offs: more agents → better specialization and modularity, but more cost, latency, and coordination failure modes. Start single-agent; split only when a clear division of labor helps.
9.6 Prominent frameworks
| Framework | Flavor |
|---|---|
| LangGraph | Graph/state-machine control; production-grade, explicit ([[08_langgraph]]) |
| CrewAI | Role-based crews of agents with tasks; high-level, quick to prototype |
| AutoGen (Microsoft) | Conversational multi-agent; agents talk to solve tasks |
| OpenAI Agents SDK / Assistants | Provider-native tools, memory, handoffs |
| LlamaIndex Agents | Data/RAG-centric agents |
| Smolagents (HF) | Minimal, code-writing agents |
All implement the same core loop; they differ in abstraction level and control.
9.7 Worked design: a research assistant agent
Goal: "Write a sourced brief on the impact of X."
codeSupervisor ├─► Researcher agent │ tools: web_search, fetch_url, rag_retriever │ loop (ReAct): search → read → extract facts → store to memory ├─► Writer agent │ reads memory (facts + citations) → drafts brief └─► Critic agent checks claims against sources (faithfulness) → requests fixes → loop Stop when critic approves OR step budget reached.
Implementation: a LangGraph with nodes {supervisor, researcher, writer, critic}, a shared State holding messages, facts, draft, and conditional edges routing back to the supervisor until approval. Memory = vector store of gathered facts; checkpointer for resumability; interrupt_before the "publish" tool for human sign-off.
9.8 Evaluation of agents
Harder than evaluating a single output because trajectories vary. Measure:
- Task success rate (did it achieve the goal? — often LLM-judged or rule-checked).
- Trajectory quality: were tool calls correct, efficient, non-redundant?
- Tool-call accuracy: right tool, right args.
- Faithfulness/groundedness (for RAG-style claims, [[06_rag]] §6.10).
- Cost & latency: tokens, number of steps, wall-clock.
- Robustness: behavior on ambiguous/adversarial inputs and tool failures.
Tools: LangSmith (tracing + eval datasets), RAGAS, custom LLM-as-judge rubrics. Always trace every step in production to debug failures.
9.9 Safety, reliability & production concerns
- Guardrails: validate/sanitize tool inputs and outputs; constrain which tools an agent may call.
- Sandboxing: run code/tools in isolated environments; never
evaluntrusted input (the democalculatorin [[08_langgraph]] is unsafe for real use). - Human-in-the-loop for irreversible/high-stakes actions (payments, emails, deletes) — gate them ([[08_langgraph]] §8.6).
- Budgets & circuit breakers: cap steps, tokens, $$ and time to prevent runaway loops.
- Prompt injection: retrieved/web content may contain malicious instructions ("ignore previous instructions…"). Treat tool/RAG output as untrusted data, not commands; separate system instructions from data; restrict tool permissions.
- Determinism & idempotency: design tools so retries don't double-charge or duplicate actions.
- Observability: log every thought, action, observation; trace and replay.
- Failure handling: retries with backoff, fallbacks, graceful "I couldn't complete this."
9.10 The big picture — how it all connects
code[01 Foundations] backprop, gradient descent — how anything learns │ [02 CNN] [03 RNN/LSTM] inductive biases for space & sequence │ [04 Transformers] attention → parallel, long-range modeling │ [05 Architectures] encoder (BERT) / decoder (GPT) / enc-dec (T5) │ ├──► embeddings (encoder) ──► [06 RAG] external knowledge │ └──► generation (decoder LLM) │ [07 LangChain] compose prompts/models/tools/retrievers │ [08 LangGraph] stateful loops, branches, multi-agent control │ [09 Agentic AI] goal-driven, tool-using, autonomous systems
Every layer rests on the one above: an autonomous agent is, at bottom, a stack of Transformer blocks ([[04_transformers]]) trained by backprop ([[01_deep_learning_foundations]]), wrapped in a reasoning loop with tools and memory.
9.11 Where to go next
- Build: implement the ReAct agent in [[08_langgraph]] §8.10, add a real search tool and a vector-store memory.
- Study: read the original papers — Attention Is All You Need (Transformer), BERT, GPT-3, ReAct, RAG, Toolformer, Reflexion.
- Practice: take one project (e.g. a docs Q&A bot) from RAG → tool-using agent → multi-agent, evaluating at each stage.
See [[10_math_appendix]] for the linear algebra / calculus / probability used throughout.