LangChain — Knowledge — Devaraj Kudumula

LangChain is a framework for building LLM applications by composing reusable components — prompts, models, retrievers, tools, parsers, memory — into pipelines. Its modern core is LCEL (LangChain Expression Language), which lets you wire components with the | operator.

Versions move fast. These notes target the modern langchain-core / LCEL style (0.1+/0.2+), which is what current docs use. Install: pip install langchain langchain-openai langchain-community.

7.1 Why a framework at all?

Raw LLM calls are simple; applications need glue: templating prompts, parsing structured output, retrying, streaming, swapping model providers, chaining steps, adding memory, calling tools, tracing. LangChain standardizes these so you write less plumbing and can swap pieces (e.g. OpenAI → Anthropic) without rewrites.

7.2 The Runnable — the universal interface

Every LangChain component implements the Runnable protocol with the same methods:

.invoke(input) — run once.
.batch([inputs]) — run many (parallelized).
.stream(input) — yield tokens/chunks as they arrive.
.ainvoke / .abatch / .astream — async versions.

Because they share this interface, any Runnable can be piped into any other. a | b builds a RunnableSequence: output of a feeds b.

python
chain = prompt | model | parser    # all three are Runnables
chain.invoke({"topic": "oceans"})  # one consistent call

7.3 The building blocks

1. Models

Two kinds:

LLMs: string in → string out.
Chat models (standard today): list of messages in → message out.

python
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm.invoke("Hello")          # → AIMessage(content="Hi! ...")

Messages have roles: SystemMessage, HumanMessage, AIMessage, ToolMessage.

2. Prompt templates

Parameterized prompts:

python
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a {style} assistant."),
    ("human", "Explain {topic} in one sentence."),
])
prompt.invoke({"style": "witty", "topic": "gravity"})   # → formatted messages

3. Output parsers

Turn raw model output into usable structures:

python
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from pydantic import BaseModel, Field

class Movie(BaseModel):
    title: str = Field(description="film title")
    year: int

parser = JsonOutputParser(pydantic_object=Movie)
# inject format instructions into the prompt:
prompt = ChatPromptTemplate.from_template(
    "Extract movie info.\n{format_instructions}\nText: {text}"
).partial(format_instructions=parser.get_format_instructions())
chain = prompt | llm | parser
chain.invoke({"text": "Inception came out in 2010"})    # → {"title": "Inception", "year": 2010}

Even cleaner: llm.with_structured_output(Movie) uses the model's native function-calling to guarantee schema-valid output.

7.4 LCEL composition primitives

Sequence a | b | c: chain steps.
RunnablePassthrough: pass input through unchanged (or add keys).
RunnableParallel (a dict): run branches concurrently, collect into a dict.
RunnableLambda: wrap any Python function as a Runnable.
.bind(...): preset arguments (e.g. stop sequences, tools).
.with_fallbacks([...]), .with_retry(): resilience.

python
from langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda
# Run two analyses in parallel on the same input, then combine
parallel = RunnableParallel(
    sentiment = sentiment_prompt | llm | StrOutputParser(),
    summary   = summary_prompt   | llm | StrOutputParser(),
)
result = parallel.invoke("Long customer review text...")
# result = {"sentiment": "...", "summary": "..."}

This is exactly how the RAG chain in [[06_rag]] worked:

python
chain = ({"context": retriever | format_docs, "question": RunnablePassthrough()}
         | prompt | llm | StrOutputParser())

The dict is a RunnableParallel: context and question are computed in parallel, then merged into the prompt's variables.

7.5 Memory (conversation history)

LLMs are stateless — each call is independent. To make a chatbot remember, you must feed prior turns back in. Modern LangChain uses RunnableWithMessageHistory:

python
from langchain_core.prompts import MessagesPlaceholder
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder("history"),     # past messages get injected here
    ("human", "{input}"),
])
chain = prompt | llm

store = {}
def get_history(session_id):
    return store.setdefault(session_id, InMemoryChatMessageHistory())

chat = RunnableWithMessageHistory(chain, get_history,
        input_messages_key="input", history_messages_key="history")

cfg = {"configurable": {"session_id": "user-1"}}
chat.invoke({"input": "My name is Dev."}, cfg)
chat.invoke({"input": "What's my name?"}, cfg)   # → "Your name is Dev."

Memory strategies for long chats: keep last-N messages (buffer window), or summarize older turns into a running summary to stay within the context window.

7.6 Tools & agents

A tool is a function the LLM can call (search, calculator, database, API). LangChain exposes tools to the model via the model's function/tool-calling ability.

python
from langchain_core.tools import tool

@tool
def multiply(a: int, b: int) -> int:
    """Multiply two integers."""        # docstring + type hints → tool schema
    return a * b

@tool
def web_search(query: str) -> str:
    """Search the web for a query and return top results."""
    return do_search(query)

llm_with_tools = llm.bind_tools([multiply, web_search])
msg = llm_with_tools.invoke("What is 23 * 17?")
# msg.tool_calls → [{"name": "multiply", "args": {"a": 23, "b": 17}, "id": ...}]

The agent loop (the model decides which tool, you execute it, feed the result back, repeat until it answers):

python
tools = {"multiply": multiply, "web_search": web_search}
messages = [HumanMessage("What is 23 * 17, then search that number's meaning?")]
while True:
    ai = llm_with_tools.invoke(messages)
    messages.append(ai)
    if not ai.tool_calls:
        break                                   # model gave a final answer
    for call in ai.tool_calls:
        result = tools[call["name"]].invoke(call["args"])
        messages.append(ToolMessage(str(result), tool_call_id=call["id"]))
print(ai.content)

This loop is an agent. Prebuilt agents exist, but writing the loop reveals the mechanics. For complex control flow (branches, cycles, multiple agents), this loop graduates to [[08_langgraph]], which is purpose-built for it. The reasoning patterns (ReAct, planning) are in [[09_agentic_ai]].

7.7 Streaming & async

python
for chunk in chain.stream({"topic": "the sun"}):
    print(chunk, end="", flush=True)     # tokens appear live

import asyncio
async def main():
    await chain.ainvoke({"topic": "the moon"})

Streaming improves perceived latency dramatically and is essentially free with LCEL.

7.8 The LangChain ecosystem

langchain-core: Runnable, prompts, messages, base abstractions (lightweight, stable).
langchain: higher-level chains/agents.
langchain-community: 100s of integrations (vector stores, loaders, tools).
Provider packages: langchain-openai, langchain-anthropic, langchain-google-....
langchain-text-splitters: chunking (used in [[06_rag]]).
LangSmith: observability/tracing/eval platform — see every step, token, latency, and cost; debug chains; run evaluations.
LangServe: deploy a Runnable as a REST API.
LangGraph: stateful, cyclic, multi-actor orchestration → [[08_langgraph]].

7.9 Document loaders & the full RAG in LangChain

python
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader
docs = PyPDFLoader("manual.pdf").load()          # → list[Document(page_content, metadata)]
# then: split → embed → vectorstore → retriever → chain (see 06_rag.md §6.8)

LangChain's value in RAG is the standardized Document object, dozens of loaders, pluggable splitters/embeddings/vector stores, and LCEL to wire it all with retries, streaming, and tracing.

7.10 Pitfalls

Version churn: APIs changed a lot across 0.0 → 0.1 → 0.2/0.3. Pin versions; prefer LCEL + langchain-core (most stable). Old LLMChain/initialize_agent patterns are deprecated.
Over-abstraction: for a single LLM call, raw SDK is simpler. Reach for LangChain when composition, swapping providers, RAG, tools, or tracing pay off.
Token/cost blindness: use LangSmith or callbacks to watch token usage — chains can balloon.
Memory unbounded growth: cap or summarize history.
Tool descriptions matter: the model picks tools from their docstrings/schemas — write them clearly.

Next: [[08_langgraph]] — when your agent needs loops, branches, state, and multiple actors, a linear chain isn't enough.