What is LangChain?
LangChain is an open-source Python (and TypeScript) framework designed to simplify building applications powered by Large Language Models (LLMs). Instead of writing low-level API calls and managing prompt templates manually, LangChain provides composable abstractions for common patterns: retrieving documents, maintaining conversation memory, calling external tools, and chaining multiple LLM calls together.
Released in October 2022 by Harrison Chase, LangChain quickly became the most popular LLM framework, with 85,000+ GitHub stars and a thriving ecosystem of integrations (50+ vector databases, 100+ document loaders, 20+ LLM providers).
Core Concepts
LangChain is built around five key abstractions:
- Models: Wrappers for LLMs (OpenAI, Claude, Llama) and embeddings (text-embedding-3-small, nomic-embed-text)
- Prompts: Templates for structuring inputs to LLMs with variable substitution
- Chains: Sequences of calls (e.g., retrieve documents → format prompt → generate answer)
- Memory: State management for multi-turn conversations
- Agents: LLMs that decide which tools to call and when
Why Use LangChain?
| Benefit | Without LangChain | With LangChain |
|---|
| RAG system | 200+ lines of boilerplate (vector DB, embeddings, retrieval, prompt formatting) | 30 lines using RetrievalQA or LCEL |
| Conversation memory | Manual session storage, context window management, summarization logic | ConversationBufferMemory or ConversationSummaryMemory (5 lines) |
| Agent with tools | Custom ReAct loop, function calling parsing, error handling, retry logic | create_react_agent() + tool decorators (20 lines) |
| Switching LLM providers | Rewrite API calls, adapt to different response formats | Change one line: ChatOpenAI() → ChatAnthropic() |
| Observability | Custom logging, metrics, tracing infrastructure | LangSmith integration (2 env vars) |
Common Use Cases and Code Examples
Use Case 1: Simple Q&A Chatbot
The simplest LangChain application: send a question, get an answer. This is useful for stateless applications like one-off queries or API endpoints.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# 1. Initialize LLM
llm = ChatOpenAI(
model="gpt-4o-mini",
temperature=0.7,
api_key="your-api-key"
)
# 2. Create prompt template
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant that answers questions concisely."),
("human", "{question}")
])
# 3. Create chain using LCEL (LangChain Expression Language)
chain = prompt | llm | StrOutputParser()
# 4. Invoke
response = chain.invoke({"question": "What is the capital of France?"})
print(response) # "Paris is the capital of France."
# For streaming responses:
for chunk in chain.stream({"question": "Explain quantum computing in 3 sentences"}):
print(chunk, end="", flush=True)
Key concepts:
- The
| operator creates a chain (LCEL syntax) StrOutputParser() extracts text from the LLM responsechain.stream() enables token-by-token streaming (crucial for UX)
Use Case 2: Chatbot with Conversation Memory
Most chatbots need to remember previous messages. LangChain handles this with Memory components.
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
# Create memory (stores full conversation history)
memory = ConversationBufferMemory()
# Create conversation chain
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True # Shows prompt sent to LLM
)
# Multi-turn conversation
response1 = conversation.predict(input="My name is Alice and I love Python.")
print(response1) # "Nice to meet you, Alice! Python is a great language..."
response2 = conversation.predict(input="What's my name?")
print(response2) # "Your name is Alice."
response3 = conversation.predict(input="What programming language do I like?")
print(response3) # "You love Python."
# Inspect memory
print(conversation.memory.buffer)
# Shows full conversation history
Production tip: For long conversations, use ConversationBufferWindowMemory(k=10) to keep only the last 10 messages, or ConversationSummaryMemory to have the LLM summarize old messages (reduces token costs).
Use Case 3: RAG (Retrieval-Augmented Generation)
RAG lets you query private documents without fine-tuning. Here's a complete example: loading PDFs, creating embeddings, storing in a vector database, and answering questions.
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# Step 1: Load documents
loader = PyPDFLoader("company_documentation.pdf")
documents = loader.load()
print(f"Loaded {len(documents)} pages")
# Step 2: Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Characters per chunk
chunk_overlap=200, # Overlap to preserve context
separators=["\n\n", "\n", " ", ""]
)
chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")
# Step 3: Create embeddings and store in vector DB
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db" # Save to disk
)
# Step 4: Create retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4} # Retrieve top 4 most relevant chunks
)
# Step 5: Create RAG chain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # "stuff" = insert all docs into one prompt
retriever=retriever,
return_source_documents=True
)
# Step 6: Query
query = "What is the company's return policy?"
result = qa_chain.invoke({"query": query})
print(f"Answer: {result['result']}")
print(f"\nSources ({len(result['source_documents'])} documents):")
for i, doc in enumerate(result['source_documents']):
print(f" [{i+1}] Page {doc.metadata.get('page', 'N/A')}")
print(f" {doc.page_content[:150]}...")
# Expected output:
# Answer: The company offers a 30-day return policy for unused products...
# Sources (4 documents):
# [1] Page 12
# Returns must be initiated within 30 days of purchase...
Alternative: Modern LCEL syntax
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# Custom prompt for better control
template = """Answer the question based on the following context.
If you don't know, say "I don't know" - don't make up information.
Context:
{context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
# Create RAG chain using LCEL
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Query
answer = rag_chain.invoke("What is the company's return policy?")
print(answer)
Use Case 4: Agent with Tools (Web Search + Calculator)
Agents let the LLM decide which tools to use and when. This example creates an agent that can search the web and perform calculations.
from langchain_openai import ChatOpenAI
from langchain.agents import Tool, create_react_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import PromptTemplate
import math
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0) # gpt-4o for better reasoning
# Define tools
search_tool = DuckDuckGoSearchRun()
def calculator(expression: str) -> str:
"""Evaluates a mathematical expression. Example: '2 + 2' or 'sqrt(16)'"""
try:
# Safe eval with math functions
result = eval(expression, {"__builtins__": {}}, vars(math))
return str(result)
except Exception as e:
return f"Error: {str(e)}"
tools = [
Tool(
name="Search",
func=search_tool.run,
description="Useful for finding current information on the internet. Input should be a search query."
),
Tool(
name="Calculator",
func=calculator,
description="Useful for mathematical calculations. Input should be a valid Python expression like '2 + 2' or 'sqrt(16) * 3'."
)
]
# Create ReAct agent
prompt = PromptTemplate.from_template("""Answer the following question as best you can. You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought: {agent_scratchpad}""")
agent = create_react_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True, # Show reasoning steps
max_iterations=10,
handle_parsing_errors=True
)
# Test queries
query1 = "What is the current price of Bitcoin in USD?"
result1 = agent_executor.invoke({"input": query1})
print(result1["output"])
query2 = "If Bitcoin is $45,000 and I buy 0.5 BTC, how much do I pay?"
result2 = agent_executor.invoke({"input": query2})
print(result2["output"])
# Expected reasoning:
# Thought: I need to search for the current Bitcoin price
# Action: Search
# Action Input: current bitcoin price USD
# Observation: Bitcoin is trading at $45,123...
# Thought: Now I need to calculate 45000 * 0.5
# Action: Calculator
# Action Input: 45000 * 0.5
# Observation: 22500.0
# Thought: I now know the final answer
# Final Answer: You would pay $22,500 for 0.5 BTC at $45,000 per coin.
What is LangGraph?
LangGraph is a library built on top of LangChain for creating stateful, multi-step agent workflows with cyclic graphs. Released in 2024, it solves a key limitation of basic LangChain chains: they are acyclic (one-way flows). LangGraph lets you build workflows where agents can loop, retry, branch conditionally, and maintain persistent state.
Why LangGraph?
Basic LangChain chains are linear: input → step1 → step2 → output. But many real-world use cases need:
- Cyclic workflows: Agent tries, evaluates result, retries if needed
- Conditional branching: Route to different sub-agents based on input type
- Human-in-the-loop: Pause for approval before executing actions
- Persistent state: Save conversation state to resume later
- Multi-agent coordination: Multiple specialized agents collaborate
LangGraph represents workflows as directed graphs where nodes are functions (agents, tools, prompts) and edges define transitions. It supports checkpointing (save/resume state) and time travel debugging.
LangChain vs LangGraph: When to Use Each
| Use Case | LangChain (Chains/Agents) | LangGraph |
|---|
| Simple RAG Q&A | ✅ Perfect fit | Overkill |
| Linear workflow (retrieve → generate) | ✅ Use LCEL chains | Unnecessary complexity |
| Agent with retry logic | ❌ Hard to implement | ✅ Native support |
| Multi-step research (search → analyze → summarize → refine) | ⚠️ Possible but messy | ✅ Clean graph structure |
| Human approval before action | ❌ No native support | ✅ Interrupt nodes |
| Conversation that needs to pause/resume | ⚠️ Manual state management | ✅ Checkpointing |
| Routing between specialized agents | ⚠️ Custom logic required | ✅ Conditional edges |
LangGraph Code Examples
Example 1: Research Agent with Self-Reflection
This agent searches the web, analyzes results, and reflects on whether the answer is good enough. If not, it searches again with a refined query.
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from typing import TypedDict, List
# Define state
class AgentState(TypedDict):
question: str
search_query: str
search_results: str
answer: str
confidence: str # "high" or "low"
iteration: int
# Initialize tools and LLM
search_tool = DuckDuckGoSearchRun()
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Node 1: Generate search query
def generate_query(state: AgentState) -> AgentState:
question = state["question"]
iteration = state.get("iteration", 0)
if iteration == 0:
query = question
else:
# Refine query based on previous attempt
prompt = f"""Previous search for "{state['search_query']}" didn't give a confident answer.
Generate a more specific search query for: {question}"""
query = llm.invoke(prompt).content
return {**state, "search_query": query, "iteration": iteration + 1}
# Node 2: Execute search
def search(state: AgentState) -> AgentState:
results = search_tool.run(state["search_query"])
return {**state, "search_results": results}
# Node 3: Generate answer and self-evaluate
def generate_answer(state: AgentState) -> AgentState:
prompt = f"""Based on these search results, answer the question.
Then evaluate your confidence (high/low).
Question: {state['question']}
Search Results: {state['search_results']}
Format:
Answer: [your answer]
Confidence: [high or low]"""
response = llm.invoke(prompt).content
# Parse response
lines = response.split("\n")
answer = next((l.replace("Answer:", "").strip() for l in lines if "Answer:" in l), "")
confidence = next((l.replace("Confidence:", "").strip().lower() for l in lines if "Confidence:" in l), "low")
return {**state, "answer": answer, "confidence": confidence}
# Decision function: retry or finish?
def should_retry(state: AgentState) -> str:
if state["confidence"] == "high" or state["iteration"] >= 3:
return "finish"
return "retry"
# Build graph
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("generate_query", generate_query)
workflow.add_node("search", search)
workflow.add_node("generate_answer", generate_answer)
# Add edges
workflow.set_entry_point("generate_query")
workflow.add_edge("generate_query", "search")
workflow.add_edge("search", "generate_answer")
workflow.add_conditional_edges(
"generate_answer",
should_retry,
{
"retry": "generate_query", # Loop back
"finish": END
}
)
# Compile graph
app = workflow.compile()
# Run
result = app.invoke({
"question": "What is the latest version of Python as of 2026?",
"iteration": 0
})
print(f"Final Answer: {result['answer']}")
print(f"Confidence: {result['confidence']}")
print(f"Iterations: {result['iteration']}")
# Expected flow:
# 1. Generate query: "latest Python version 2026"
# 2. Search → results
# 3. Generate answer → confidence: low (vague results)
# 4. Retry: Generate refined query: "Python 3.13 release date 2026"
# 5. Search → better results
# 6. Generate answer → confidence: high
# 7. Finish
Example 2: Human-in-the-Loop Approval Workflow
This agent drafts an email, asks for human approval, and only sends if approved.
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver # For interrupts
from langchain_openai import ChatOpenAI
from typing import TypedDict
class EmailState(TypedDict):
recipient: str
topic: str
draft: str
approved: bool
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
# Node 1: Draft email
def draft_email(state: EmailState) -> EmailState:
prompt = f"Write a professional email to {state['recipient']} about {state['topic']}."
draft = llm.invoke(prompt).content
return {**state, "draft": draft}
# Node 2: Send email (only if approved)
def send_email(state: EmailState) -> EmailState:
print(f"📧 Sending email to {state['recipient']}:")
print(state['draft'])
return state
# Decision: check approval
def check_approval(state: EmailState) -> str:
return "send" if state.get("approved", False) else END
# Build graph with checkpointing
workflow = StateGraph(EmailState)
workflow.add_node("draft", draft_email)
workflow.add_node("send", send_email)
workflow.set_entry_point("draft")
workflow.add_conditional_edges(
"draft",
check_approval,
{"send": "send", END: END}
)
workflow.add_edge("send", END)
# Compile with memory (required for interrupts)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory, interrupt_before=["send"])
# Run with a thread_id (required for stateful execution)
config = {"configurable": {"thread_id": "email-123"}}
# Step 1: Draft email
result1 = app.invoke({
"recipient": "john@example.com",
"topic": "Q1 2026 product roadmap"
}, config)
print("Draft created:")
print(result1["draft"])
print("\n⏸️ Workflow paused. Review and approve.")
# --- Human reviews draft here ---
# Step 2: Approve and continue
result2 = app.invoke({
**result1,
"approved": True # Human approval
}, config)
# Output:
# Draft created:
# Subject: Q1 2026 Product Roadmap
#
# Dear John,
#
# I wanted to share our product roadmap for Q1 2026...
#
# ⏸️ Workflow paused. Review and approve.
# 📧 Sending email to john@example.com:
# [email content]
Example 3: Multi-Agent Collaboration (Researcher + Writer)
Two specialized agents collaborate: one researches, the other writes a report.
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from typing import TypedDict, List
class ResearchState(TypedDict):
topic: str
research_notes: str
article: str
search_tool = DuckDuckGoSearchRun()
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
# Agent 1: Researcher
def research(state: ResearchState) -> ResearchState:
topic = state["topic"]
# Search for information
search_results = search_tool.run(f"{topic} latest developments 2026")
# Analyze and summarize
prompt = f"""Analyze these search results and extract key facts about {topic}.
Focus on: definitions, recent developments, expert opinions, statistics.
Search Results:
{search_results}
Output format:
- Key fact 1
- Key fact 2
..."""
notes = llm.invoke(prompt).content
return {**state, "research_notes": notes}
# Agent 2: Writer
def write_article(state: ResearchState) -> ResearchState:
prompt = f"""Write a 300-word article about {state['topic']}.
Use these research notes as your source:
{state['research_notes']}
Write in a professional, engaging style. Include an introduction, key points, and conclusion."""
article = llm.invoke(prompt).content
return {**state, "article": article}
# Build graph
workflow = StateGraph(ResearchState)
workflow.add_node("researcher", research)
workflow.add_node("writer", write_article)
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)
app = workflow.compile()
# Run
result = app.invoke({"topic": "Impact of AI Act regulations on European startups"})
print("Research Notes:")
print(result["research_notes"])
print("\n" + "="*50 + "\n")
print("Article:")
print(result["article"])
Production Best Practices
1. Cost Optimization
- Use cheaper models for simple tasks: gpt-4o-mini ($0.15/1M tokens) instead of gpt-4o ($5/1M tokens) for RAG generation
- Cache embeddings: Store embeddings for 30 days to avoid re-computing for same queries
- Batch API calls: Use OpenAI batch API for non-urgent tasks (50% discount)
- Limit context window: Use ConversationBufferWindowMemory instead of full history
- Consider local LLMs: Llama 3.3 70B via Ollama costs $0 per query (requires GPU)
2. Monitoring with LangSmith
LangSmith is LangChain's observability platform. Enable it with 2 environment variables:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
# Now all chains/agents auto-log to LangSmith
# View traces at: https://smith.langchain.com/
What you get:
- Full trace of every chain execution (inputs, outputs, latencies)
- Token usage and cost tracking per call
- Error analysis and failure patterns
- A/B testing of prompts and models
- Dataset creation from production logs
3. Error Handling and Retries
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableRetry
llm = ChatOpenAI(model="gpt-4o-mini")
# Add automatic retries (up to 3 attempts)
llm_with_retry = RunnableRetry(
bound=llm,
max_attempts=3,
wait_exponential_jitter=True # Exponential backoff
)
# Use in chain
chain = prompt | llm_with_retry | output_parser
# For agents: add fallback LLM
from langchain_anthropic import ChatAnthropic
llm_primary = ChatOpenAI(model="gpt-4o")
llm_fallback = ChatAnthropic(model="claude-3-5-sonnet-20241022")
llm = llm_primary.with_fallbacks([llm_fallback])
# If OpenAI fails, automatically tries Claude
4. Streaming for Better UX
Always stream responses in user-facing applications:
# FastAPI endpoint example
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
app = FastAPI()
@app.post("/chat")
async def chat(question: str):
llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{question}")
])
chain = prompt | llm
async def generate():
async for chunk in chain.astream({"question": question}):
yield chunk.content
return StreamingResponse(generate(), media_type="text/plain")
# Client receives tokens as they're generated (feels instant)
5. Security Best Practices
- Never put API keys in code: Use environment variables or secret managers
- Validate user inputs: Sanitize queries before passing to LLMs (prevent prompt injection)
- Rate limit agents: Set max_iterations to prevent infinite loops
- Restrict tool access: Only give agents access to necessary tools (principle of least privilege)
- Use constitutional AI: Add a moderation layer to filter harmful outputs
Common Pitfalls and How to Avoid Them
| Pitfall | Consequence | Solution |
|---|
| Not limiting agent iterations | Infinite loops, high costs | Set max_iterations=10 in AgentExecutor |
| Using full conversation history | Context window exceeded, high token costs | Use ConversationSummaryMemory or BufferWindowMemory(k=10) |
| Not caching embeddings | Repeated embedding costs for same documents | Use CacheBackedEmbeddings with LocalFileStore or Redis |
| Synchronous API calls | Slow response times | Use chain.ainvoke() or chain.astream() (async versions) |
| Not handling API errors | Application crashes on rate limits | Use RunnableRetry with exponential backoff |
| Using gpt-4 for everything | 10x higher costs vs gpt-4o-mini | Reserve gpt-4o for complex reasoning, use mini for RAG/classification |
| No observability | Can't debug production issues | Enable LangSmith tracing from day 1 |
Real-World Architecture Example
Here's a production-ready architecture for a customer support chatbot with RAG, using LangChain + LangGraph:
# File: production_chatbot.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver # Persistent state
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Pinecone
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from typing import TypedDict
import pinecone
# State definition
class ChatState(TypedDict):
user_id: str
message: str
intent: str # "faq", "billing", "technical", "human_handoff"
response: str
conversation_history: str
# Initialize components
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Vector store (docs indexed beforehand)
pinecone.init(api_key="your-key", environment="us-west1-gcp")
vectorstore = Pinecone.from_existing_index("support-docs", embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# RAG chain
rag_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True
)
# Node 1: Classify intent
def classify_intent(state: ChatState) -> ChatState:
prompt = f"""Classify the user's intent into one of these categories:
- faq: General questions about the product
- billing: Payment, invoices, subscriptions
- technical: Bug reports, technical issues
- human_handoff: Urgent or complex issues requiring human agent
User message: {state['message']}
Conversation history: {state.get('conversation_history', 'None')}
Respond with just the category name."""
intent = llm.invoke(prompt).content.strip().lower()
return {**state, "intent": intent}
# Node 2: Handle FAQ (RAG)
def handle_faq(state: ChatState) -> ChatState:
result = rag_chain.invoke({"query": state["message"]})
response = result["result"]
# Add sources
sources = [doc.metadata.get("source", "Unknown") for doc in result["source_documents"]]
response += f"\n\nSources: {', '.join(sources)}"
return {**state, "response": response}
# Node 3: Handle billing (hypothetical)
def handle_billing(state: ChatState) -> ChatState:
# In reality: query billing API, return account info
response = "I've pulled up your billing information. Your next invoice is due on April 15, 2026."
return {**state, "response": response}
# Node 4: Human handoff
def human_handoff(state: ChatState) -> ChatState:
# In reality: create support ticket, notify team
response = "I'm connecting you with a human agent. Average wait time: 2 minutes."
return {**state, "response": response}
# Routing function
def route_intent(state: ChatState) -> str:
intent_map = {
"faq": "faq",
"billing": "billing",
"technical": "human",
"human_handoff": "human"
}
return intent_map.get(state["intent"], "faq")
# Build graph
workflow = StateGraph(ChatState)
workflow.add_node("classify", classify_intent)
workflow.add_node("faq", handle_faq)
workflow.add_node("billing", handle_billing)
workflow.add_node("human", human_handoff)
workflow.set_entry_point("classify")
workflow.add_conditional_edges(
"classify",
route_intent,
{"faq": "faq", "billing": "billing", "human": "human"}
)
workflow.add_edge("faq", END)
workflow.add_edge("billing", END)
workflow.add_edge("human", END)
# Compile with PostgreSQL checkpointing for persistence
checkpointer = PostgresSaver.from_conn_string("postgresql://user:pass@host:5432/db")
app = workflow.compile(checkpointer=checkpointer)
# API endpoint (FastAPI)
from fastapi import FastAPI
api = FastAPI()
@api.post("/chat")
async def chat(user_id: str, message: str):
config = {"configurable": {"thread_id": f"user-{user_id}"}}
result = await app.ainvoke({
"user_id": user_id,
"message": message
}, config)
return {"response": result["response"], "intent": result["intent"]}
# Deployment: Docker + AWS ECS Fargate
# Horizontal scaling: multiple containers, state in Postgres
# Monitoring: LangSmith + Datadog
Learning Resources
To master LangChain and LangGraph, we recommend:
- Official LangChain documentation: python.langchain.com/docs (comprehensive API reference)
- LangGraph documentation: langchain-ai.github.io/langgraph (tutorials and examples)
- LangChain Academy: Free course by Harrison Chase (creator) covering fundamentals to advanced agents
- LangSmith Cookbook: Production patterns and best practices
For hands-on professional training, Talki Academy offers:
- RAG and Agents in Production (3-day intensive): Build production RAG systems with LangChain, LlamaIndex, and LangGraph. Includes real-world projects and deployment to AWS.
- Claude API for Developers (2 days): Master Claude 4.5 with LangChain integration. Advanced prompt engineering, function calling, and cost optimization.
Frequently Asked Questions
When should I use LangGraph instead of basic LangChain?
Use LangGraph when you need: (1) Complex multi-step workflows with conditional branching, (2) Agents that need to revise their work based on feedback, (3) Persistent state across conversation turns, (4) Human-in-the-loop approvals, or (5) Cyclic workflows (agent tries, evaluates, retries). For simple linear chains (retrieve → generate), basic LangChain is sufficient.
Can I use LangChain with open-source LLMs like Llama or Mistral?
Yes, absolutely. LangChain has native integrations with Ollama (local inference), vLLM (GPU-accelerated serving), HuggingFace Transformers, and OpenAI-compatible API servers. Example: ChatOllama(model='llama3.3:70b') works identically to ChatOpenAI. Same for embeddings: OllamaEmbeddings(model='nomic-embed-text') replaces OpenAIEmbeddings. This lets you run everything locally with zero API costs.
What's the real production cost of a LangChain RAG application?
For a typical customer support RAG with 50k queries/month: ~$400-600/month (OpenAI GPT-4o mini + embeddings + Pinecone vector DB + compute). You can reduce this 60% by using local Llama 3.3 70B ($0 LLM calls, +$200/month GPU) and self-hosted ChromaDB ($0 vector DB). Total optimized cost: ~$250/month. Key savings: cache embeddings (30-day TTL), use gpt-4o-mini not gpt-4, batch queries where possible.
How do I add memory to a LangChain chatbot?
LangChain offers 4 memory strategies: (1) ConversationBufferMemory (stores all messages, simple but memory grows), (2) ConversationBufferWindowMemory (last N messages, good for long chats), (3) ConversationSummaryMemory (LLM summarizes old messages, best for production), (4) ConversationKGMemory (knowledge graph of facts). For most apps: start with BufferWindowMemory(k=10), upgrade to SummaryMemory when context limits hit. With LangGraph, use checkpoints for persistent state across sessions.
What's the difference between LangChain Expression Language (LCEL) and the old Chain API?
LCEL (introduced 2023) is the modern way to build chains using the pipe operator (|). Advantages: (1) streaming by default, (2) async/sync auto-handled, (3) better debugging, (4) LangSmith integration. Old Chain API (RetrievalQA, etc.) still works but is maintenance mode. Example: retriever | prompt | llm | output_parser (LCEL) vs RetrievalQA.from_chain_type() (old). Recommendation: use LCEL for new projects, migrate old chains progressively.