back to knowledge base
module 097 min read

Agentic AI

ReAct, planning, tool use, multi-agent systems, memory, and evaluation.

An agent is an LLM that, given a goal, decides its own steps — choosing actions (tools), observing results, and iterating until done — rather than producing a single fixed response. RAG ([[06_rag]]) is a one-shot tool use; agents generalize this into autonomous, multi-step loops built on the orchestration of [[07_langchain]]/[[08_langgraph]].


9.1 LLM vs Agent

Plain LLM callAgent
input → output, one shotgoal → plan → act → observe → repeat → result
statelessmaintains state/memory across steps
no external actionscalls tools, APIs, code, other agents
knowledge frozenfetches fresh info, acts on the world

The agent = LLM (reasoning) + tools (acting) + loop (control) + memory (state).


9.2 The ReAct pattern (Reason + Act)

The foundational agent loop. The model interleaves reasoning traces with actions:

code
Thought:   I need the population of France and Germany, then their ratio.
Action:    search("population of France")
Observation: 68 million
Thought:   Now Germany.
Action:    search("population of Germany")
Observation: 84 million
Thought:   Ratio = 68/84 ≈ 0.81.
Action:    finish("≈ 0.81")

Why it works: writing the reasoning ("Thought") conditions the next action on an explicit plan, and "Observation" grounds each step in real tool output — reducing hallucination and enabling multi-hop tasks. This is exactly the cycle implemented as a graph in [[08_langgraph]] §8.4.

Prompt structure that elicits it:

code
You can use tools: {tool_descriptions}.
Use this format:
Thought: <reasoning>
Action: <tool>(<args>)
Observation: <result>
... (repeat) ...
Thought: I now know the answer.
Final Answer: <answer>

9.3 The anatomy of an agent

code
            ┌──────────────────────────────────────────┐
   GOAL ──► │  PLANNER / REASONER (LLM)                 │
            │   ↑ memory          ↓ chosen action       │
            │  MEMORY ◄── observations   TOOL EXECUTOR   │──► world (APIs, code, search)
            └──────────────────────────────────────────┘
                         loop until goal met / budget hit

1. Reasoning / Planning

  • ReAct: plan one step at a time (reactive). Robust, simple.
  • Plan-and-Execute: make a full multi-step plan upfront, then execute (optionally re-plan). Fewer LLM calls, better for long tasks; brittle if the world changes.
  • Tree of Thoughts / self-consistency: explore multiple reasoning branches and vote/search — more compute for harder problems.
  • Reflection / self-critique (Reflexion): after acting, the agent critiques its own output and retries — improves quality on coding/reasoning.

2. Tools (the agent's hands)

Anything callable with a described schema: web search, code interpreter, calculator, SQL/DB, file I/O, HTTP APIs, RAG retriever, even other agents. Tool descriptions (name, purpose, args) are how the LLM decides what to use — write them precisely (see [[07_langchain]] §7.6). Modern models do native tool/function calling: they output a structured {name, args} the runtime executes.

3. Memory

  • Short-term (working): the running message history / scratchpad within a task (the context window).
  • Long-term: persisted facts across sessions, usually in a vector store (semantic memory) — the agent retrieves relevant past info with RAG. Also episodic (past interactions) and procedural (learned skills/instructions).
  • Memory management = summarize/trim to fit context, and store/retrieve from external stores. ([[08_langgraph]] checkpointers give thread-scoped memory.)

4. Control loop / orchestration

Decides when to call tools, when to stop, how to handle errors, step/cost budgets, retries. This is what LangGraph formalizes.


9.4 Tool calling under the hood

python
# 1. Define tool with schema (name, description, typed args)
# 2. Pass tool schemas to the LLM
# 3. LLM returns: {"tool": "search", "args": {"query": "..."}}   (structured)
# 4. Runtime executes the actual function
# 5. Feed the result back as an observation
# 6. LLM decides: call another tool, or give the final answer

The LLM never executes anything itself — it only emits a request; your code runs it and returns the result. This separation is the safety boundary.

python
# pseudo-loop (mirrors 07_langchain §7.6)
messages = [system_prompt, user_goal]
for step in range(MAX_STEPS):                  # budget guard
    ai = llm_with_tools.invoke(messages)
    messages.append(ai)
    if not ai.tool_calls:
        return ai.content                      # final answer
    for call in ai.tool_calls:
        result = run_tool(call)                # your sandboxed executor
        messages.append(ToolMessage(result, tool_call_id=call["id"]))

9.5 Multi-agent systems

Split a hard problem across specialized agents (each with focused tools, prompts, and possibly different models). Patterns (also in [[08_langgraph]] §8.7):

  • Supervisor / orchestrator-worker: a manager agent routes subtasks to specialists (researcher, coder, reviewer) and synthesizes results.
  • Pipeline: agents in sequence (e.g. outline → draft → edit).
  • Debate / collaboration: agents critique each other to improve answers.
  • Hierarchical teams: supervisors of supervisors for big workflows.

Trade-offs: more agents → better specialization and modularity, but more cost, latency, and coordination failure modes. Start single-agent; split only when a clear division of labor helps.


9.6 Prominent frameworks

FrameworkFlavor
LangGraphGraph/state-machine control; production-grade, explicit ([[08_langgraph]])
CrewAIRole-based crews of agents with tasks; high-level, quick to prototype
AutoGen (Microsoft)Conversational multi-agent; agents talk to solve tasks
OpenAI Agents SDK / AssistantsProvider-native tools, memory, handoffs
LlamaIndex AgentsData/RAG-centric agents
Smolagents (HF)Minimal, code-writing agents

All implement the same core loop; they differ in abstraction level and control.


9.7 Worked design: a research assistant agent

Goal: "Write a sourced brief on the impact of X."

code
Supervisor
  ├─► Researcher agent
  │      tools: web_search, fetch_url, rag_retriever
  │      loop (ReAct): search → read → extract facts → store to memory
  ├─► Writer agent
  │      reads memory (facts + citations) → drafts brief
  └─► Critic agent
         checks claims against sources (faithfulness) → requests fixes → loop
Stop when critic approves OR step budget reached.

Implementation: a LangGraph with nodes {supervisor, researcher, writer, critic}, a shared State holding messages, facts, draft, and conditional edges routing back to the supervisor until approval. Memory = vector store of gathered facts; checkpointer for resumability; interrupt_before the "publish" tool for human sign-off.


9.8 Evaluation of agents

Harder than evaluating a single output because trajectories vary. Measure:

  • Task success rate (did it achieve the goal? — often LLM-judged or rule-checked).
  • Trajectory quality: were tool calls correct, efficient, non-redundant?
  • Tool-call accuracy: right tool, right args.
  • Faithfulness/groundedness (for RAG-style claims, [[06_rag]] §6.10).
  • Cost & latency: tokens, number of steps, wall-clock.
  • Robustness: behavior on ambiguous/adversarial inputs and tool failures.

Tools: LangSmith (tracing + eval datasets), RAGAS, custom LLM-as-judge rubrics. Always trace every step in production to debug failures.


9.9 Safety, reliability & production concerns

  • Guardrails: validate/sanitize tool inputs and outputs; constrain which tools an agent may call.
  • Sandboxing: run code/tools in isolated environments; never eval untrusted input (the demo calculator in [[08_langgraph]] is unsafe for real use).
  • Human-in-the-loop for irreversible/high-stakes actions (payments, emails, deletes) — gate them ([[08_langgraph]] §8.6).
  • Budgets & circuit breakers: cap steps, tokens, $$ and time to prevent runaway loops.
  • Prompt injection: retrieved/web content may contain malicious instructions ("ignore previous instructions…"). Treat tool/RAG output as untrusted data, not commands; separate system instructions from data; restrict tool permissions.
  • Determinism & idempotency: design tools so retries don't double-charge or duplicate actions.
  • Observability: log every thought, action, observation; trace and replay.
  • Failure handling: retries with backoff, fallbacks, graceful "I couldn't complete this."

9.10 The big picture — how it all connects

code
[01 Foundations]  backprop, gradient descent — how anything learns
[02 CNN] [03 RNN/LSTM]  inductive biases for space & sequence
[04 Transformers]  attention → parallel, long-range modeling
[05 Architectures]  encoder (BERT) / decoder (GPT) / enc-dec (T5)
        ├──► embeddings (encoder) ──► [06 RAG]  external knowledge
        └──► generation (decoder LLM)
            [07 LangChain]  compose prompts/models/tools/retrievers
            [08 LangGraph]  stateful loops, branches, multi-agent control
            [09 Agentic AI]  goal-driven, tool-using, autonomous systems

Every layer rests on the one above: an autonomous agent is, at bottom, a stack of Transformer blocks ([[04_transformers]]) trained by backprop ([[01_deep_learning_foundations]]), wrapped in a reasoning loop with tools and memory.


9.11 Where to go next

  • Build: implement the ReAct agent in [[08_langgraph]] §8.10, add a real search tool and a vector-store memory.
  • Study: read the original papers — Attention Is All You Need (Transformer), BERT, GPT-3, ReAct, RAG, Toolformer, Reflexion.
  • Practice: take one project (e.g. a docs Q&A bot) from RAG → tool-using agent → multi-agent, evaluating at each stage.

See [[10_math_appendix]] for the linear algebra / calculus / probability used throughout.