LangChain is a framework for building LLM applications by composing reusable components — prompts, models, retrievers, tools, parsers, memory — into pipelines. Its modern core is LCEL (LangChain Expression Language), which lets you wire components with the | operator.
Versions move fast. These notes target the modern
langchain-core/ LCEL style (0.1+/0.2+), which is what current docs use. Install:pip install langchain langchain-openai langchain-community.
7.1 Why a framework at all?
Raw LLM calls are simple; applications need glue: templating prompts, parsing structured output, retrying, streaming, swapping model providers, chaining steps, adding memory, calling tools, tracing. LangChain standardizes these so you write less plumbing and can swap pieces (e.g. OpenAI → Anthropic) without rewrites.
7.2 The Runnable — the universal interface
Every LangChain component implements the Runnable protocol with the same methods:
.invoke(input)— run once..batch([inputs])— run many (parallelized)..stream(input)— yield tokens/chunks as they arrive..ainvoke / .abatch / .astream— async versions.
Because they share this interface, any Runnable can be piped into any other. a | b builds a RunnableSequence: output of a feeds b.
pythonchain = prompt | model | parser # all three are Runnables chain.invoke({"topic": "oceans"}) # one consistent call
7.3 The building blocks
1. Models
Two kinds:
- LLMs: string in → string out.
- Chat models (standard today): list of messages in → message out.
pythonfrom langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) llm.invoke("Hello") # → AIMessage(content="Hi! ...")
Messages have roles: SystemMessage, HumanMessage, AIMessage, ToolMessage.
2. Prompt templates
Parameterized prompts:
pythonfrom langchain_core.prompts import ChatPromptTemplate prompt = ChatPromptTemplate.from_messages([ ("system", "You are a {style} assistant."), ("human", "Explain {topic} in one sentence."), ]) prompt.invoke({"style": "witty", "topic": "gravity"}) # → formatted messages
3. Output parsers
Turn raw model output into usable structures:
pythonfrom langchain_core.output_parsers import StrOutputParser, JsonOutputParser from pydantic import BaseModel, Field class Movie(BaseModel): title: str = Field(description="film title") year: int parser = JsonOutputParser(pydantic_object=Movie) # inject format instructions into the prompt: prompt = ChatPromptTemplate.from_template( "Extract movie info.\n{format_instructions}\nText: {text}" ).partial(format_instructions=parser.get_format_instructions()) chain = prompt | llm | parser chain.invoke({"text": "Inception came out in 2010"}) # → {"title": "Inception", "year": 2010}
Even cleaner: llm.with_structured_output(Movie) uses the model's native function-calling to guarantee schema-valid output.
7.4 LCEL composition primitives
- Sequence
a | b | c: chain steps. RunnablePassthrough: pass input through unchanged (or add keys).RunnableParallel(a dict): run branches concurrently, collect into a dict.RunnableLambda: wrap any Python function as a Runnable..bind(...): preset arguments (e.g. stop sequences, tools)..with_fallbacks([...]),.with_retry(): resilience.
pythonfrom langchain_core.runnables import RunnableParallel, RunnablePassthrough, RunnableLambda # Run two analyses in parallel on the same input, then combine parallel = RunnableParallel( sentiment = sentiment_prompt | llm | StrOutputParser(), summary = summary_prompt | llm | StrOutputParser(), ) result = parallel.invoke("Long customer review text...") # result = {"sentiment": "...", "summary": "..."}
This is exactly how the RAG chain in [[06_rag]] worked:
pythonchain = ({"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser())
The dict is a RunnableParallel: context and question are computed in parallel, then merged into the prompt's variables.
7.5 Memory (conversation history)
LLMs are stateless — each call is independent. To make a chatbot remember, you must feed prior turns back in. Modern LangChain uses RunnableWithMessageHistory:
pythonfrom langchain_core.prompts import MessagesPlaceholder from langchain_core.chat_history import InMemoryChatMessageHistory from langchain_core.runnables.history import RunnableWithMessageHistory prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), MessagesPlaceholder("history"), # past messages get injected here ("human", "{input}"), ]) chain = prompt | llm store = {} def get_history(session_id): return store.setdefault(session_id, InMemoryChatMessageHistory()) chat = RunnableWithMessageHistory(chain, get_history, input_messages_key="input", history_messages_key="history") cfg = {"configurable": {"session_id": "user-1"}} chat.invoke({"input": "My name is Dev."}, cfg) chat.invoke({"input": "What's my name?"}, cfg) # → "Your name is Dev."
Memory strategies for long chats: keep last-N messages (buffer window), or summarize older turns into a running summary to stay within the context window.
7.6 Tools & agents
A tool is a function the LLM can call (search, calculator, database, API). LangChain exposes tools to the model via the model's function/tool-calling ability.
pythonfrom langchain_core.tools import tool @tool def multiply(a: int, b: int) -> int: """Multiply two integers.""" # docstring + type hints → tool schema return a * b @tool def web_search(query: str) -> str: """Search the web for a query and return top results.""" return do_search(query) llm_with_tools = llm.bind_tools([multiply, web_search]) msg = llm_with_tools.invoke("What is 23 * 17?") # msg.tool_calls → [{"name": "multiply", "args": {"a": 23, "b": 17}, "id": ...}]
The agent loop (the model decides which tool, you execute it, feed the result back, repeat until it answers):
pythontools = {"multiply": multiply, "web_search": web_search} messages = [HumanMessage("What is 23 * 17, then search that number's meaning?")] while True: ai = llm_with_tools.invoke(messages) messages.append(ai) if not ai.tool_calls: break # model gave a final answer for call in ai.tool_calls: result = tools[call["name"]].invoke(call["args"]) messages.append(ToolMessage(str(result), tool_call_id=call["id"])) print(ai.content)
This loop is an agent. Prebuilt agents exist, but writing the loop reveals the mechanics. For complex control flow (branches, cycles, multiple agents), this loop graduates to [[08_langgraph]], which is purpose-built for it. The reasoning patterns (ReAct, planning) are in [[09_agentic_ai]].
7.7 Streaming & async
pythonfor chunk in chain.stream({"topic": "the sun"}): print(chunk, end="", flush=True) # tokens appear live import asyncio async def main(): await chain.ainvoke({"topic": "the moon"})
Streaming improves perceived latency dramatically and is essentially free with LCEL.
7.8 The LangChain ecosystem
langchain-core: Runnable, prompts, messages, base abstractions (lightweight, stable).langchain: higher-level chains/agents.langchain-community: 100s of integrations (vector stores, loaders, tools).- Provider packages:
langchain-openai,langchain-anthropic,langchain-google-.... langchain-text-splitters: chunking (used in [[06_rag]]).- LangSmith: observability/tracing/eval platform — see every step, token, latency, and cost; debug chains; run evaluations.
- LangServe: deploy a Runnable as a REST API.
- LangGraph: stateful, cyclic, multi-actor orchestration → [[08_langgraph]].
7.9 Document loaders & the full RAG in LangChain
pythonfrom langchain_community.document_loaders import PyPDFLoader, WebBaseLoader docs = PyPDFLoader("manual.pdf").load() # → list[Document(page_content, metadata)] # then: split → embed → vectorstore → retriever → chain (see 06_rag.md §6.8)
LangChain's value in RAG is the standardized Document object, dozens of loaders, pluggable splitters/embeddings/vector stores, and LCEL to wire it all with retries, streaming, and tracing.
7.10 Pitfalls
- Version churn: APIs changed a lot across 0.0 → 0.1 → 0.2/0.3. Pin versions; prefer LCEL +
langchain-core(most stable). OldLLMChain/initialize_agentpatterns are deprecated. - Over-abstraction: for a single LLM call, raw SDK is simpler. Reach for LangChain when composition, swapping providers, RAG, tools, or tracing pay off.
- Token/cost blindness: use LangSmith or callbacks to watch token usage — chains can balloon.
- Memory unbounded growth: cap or summarize history.
- Tool descriptions matter: the model picks tools from their docstrings/schemas — write them clearly.
Next: [[08_langgraph]] — when your agent needs loops, branches, state, and multiple actors, a linear chain isn't enough.