LangChain and LangGraph: Practical Guide for Building AI Agents in 2026

What is LangChain?

LangChain is an open-source Python (and TypeScript) framework designed to simplify building applications powered by Large Language Models (LLMs). Instead of writing low-level API calls and managing prompt templates manually, LangChain provides composable abstractions for common patterns: retrieving documents, maintaining conversation memory, calling external tools, and chaining multiple LLM calls together.

Released in October 2022 by Harrison Chase, LangChain quickly became the most popular LLM framework, with 85,000+ GitHub stars and a thriving ecosystem of integrations (50+ vector databases, 100+ document loaders, 20+ LLM providers).

Core Concepts

LangChain is built around five key abstractions:

Models: Wrappers for LLMs (OpenAI, Claude, Llama) and embeddings (text-embedding-3-small, nomic-embed-text)
Prompts: Templates for structuring inputs to LLMs with variable substitution
Chains: Sequences of calls (e.g., retrieve documents → format prompt → generate answer)
Memory: State management for multi-turn conversations
Agents: LLMs that decide which tools to call and when

Why Use LangChain?

Benefit	Without LangChain	With LangChain
RAG system	200+ lines of boilerplate (vector DB, embeddings, retrieval, prompt formatting)	30 lines using RetrievalQA or LCEL
Conversation memory	Manual session storage, context window management, summarization logic	ConversationBufferMemory or ConversationSummaryMemory (5 lines)
Agent with tools	Custom ReAct loop, function calling parsing, error handling, retry logic	create_react_agent() + tool decorators (20 lines)
Switching LLM providers	Rewrite API calls, adapt to different response formats	Change one line: ChatOpenAI() → ChatAnthropic()
Observability	Custom logging, metrics, tracing infrastructure	LangSmith integration (2 env vars)

Common Use Cases and Code Examples

Use Case 1: Simple Q&A Chatbot

The simplest LangChain application: send a question, get an answer. This is useful for stateless applications like one-off queries or API endpoints.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# 1. Initialize LLM
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    api_key="your-api-key"
)

# 2. Create prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that answers questions concisely."),
    ("human", "{question}")
])

# 3. Create chain using LCEL (LangChain Expression Language)
chain = prompt | llm | StrOutputParser()

# 4. Invoke
response = chain.invoke({"question": "What is the capital of France?"})
print(response)  # "Paris is the capital of France."

# For streaming responses:
for chunk in chain.stream({"question": "Explain quantum computing in 3 sentences"}):
    print(chunk, end="", flush=True)

Key concepts:

The | operator creates a chain (LCEL syntax)
StrOutputParser() extracts text from the LLM response
chain.stream() enables token-by-token streaming (crucial for UX)

Use Case 2: Chatbot with Conversation Memory

Most chatbots need to remember previous messages. LangChain handles this with Memory components.

from langchain_openai import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# Create memory (stores full conversation history)
memory = ConversationBufferMemory()

# Create conversation chain
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True  # Shows prompt sent to LLM
)

# Multi-turn conversation
response1 = conversation.predict(input="My name is Alice and I love Python.")
print(response1)  # "Nice to meet you, Alice! Python is a great language..."

response2 = conversation.predict(input="What's my name?")
print(response2)  # "Your name is Alice."

response3 = conversation.predict(input="What programming language do I like?")
print(response3)  # "You love Python."

# Inspect memory
print(conversation.memory.buffer)
# Shows full conversation history

Production tip: For long conversations, use ConversationBufferWindowMemory(k=10) to keep only the last 10 messages, or ConversationSummaryMemory to have the LLM summarize old messages (reduces token costs).

Use Case 3: RAG (Retrieval-Augmented Generation)

RAG lets you query private documents without fine-tuning. Here's a complete example: loading PDFs, creating embeddings, storing in a vector database, and answering questions.

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

# Step 1: Load documents
loader = PyPDFLoader("company_documentation.pdf")
documents = loader.load()
print(f"Loaded {len(documents)} pages")

# Step 2: Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,  # Characters per chunk
    chunk_overlap=200,  # Overlap to preserve context
    separators=["\n\n", "\n", " ", ""]
)
chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks")

# Step 3: Create embeddings and store in vector DB
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # Save to disk
)

# Step 4: Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}  # Retrieve top 4 most relevant chunks
)

# Step 5: Create RAG chain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",  # "stuff" = insert all docs into one prompt
    retriever=retriever,
    return_source_documents=True
)

# Step 6: Query
query = "What is the company's return policy?"
result = qa_chain.invoke({"query": query})

print(f"Answer: {result['result']}")
print(f"\nSources ({len(result['source_documents'])} documents):")
for i, doc in enumerate(result['source_documents']):
    print(f"  [{i+1}] Page {doc.metadata.get('page', 'N/A')}")
    print(f"      {doc.page_content[:150]}...")

# Expected output:
# Answer: The company offers a 30-day return policy for unused products...
# Sources (4 documents):
#   [1] Page 12
#       Returns must be initiated within 30 days of purchase...

Alternative: Modern LCEL syntax

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# Custom prompt for better control
template = """Answer the question based on the following context.
If you don't know, say "I don't know" - don't make up information.

Context:
{context}

Question: {question}

Answer:"""

prompt = ChatPromptTemplate.from_template(template)

# Create RAG chain using LCEL
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Query
answer = rag_chain.invoke("What is the company's return policy?")
print(answer)

Use Case 4: Agent with Tools (Web Search + Calculator)

Agents let the LLM decide which tools to use and when. This example creates an agent that can search the web and perform calculations.

from langchain_openai import ChatOpenAI
from langchain.agents import Tool, create_react_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import PromptTemplate
import math

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)  # gpt-4o for better reasoning

# Define tools
search_tool = DuckDuckGoSearchRun()

def calculator(expression: str) -> str:
    """Evaluates a mathematical expression. Example: '2 + 2' or 'sqrt(16)'"""
    try:
        # Safe eval with math functions
        result = eval(expression, {"__builtins__": {}}, vars(math))
        return str(result)
    except Exception as e:
        return f"Error: {str(e)}"

tools = [
    Tool(
        name="Search",
        func=search_tool.run,
        description="Useful for finding current information on the internet. Input should be a search query."
    ),
    Tool(
        name="Calculator",
        func=calculator,
        description="Useful for mathematical calculations. Input should be a valid Python expression like '2 + 2' or 'sqrt(16) * 3'."
    )
]

# Create ReAct agent
prompt = PromptTemplate.from_template("""Answer the following question as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought: {agent_scratchpad}""")

agent = create_react_agent(llm, tools, prompt)

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,  # Show reasoning steps
    max_iterations=10,
    handle_parsing_errors=True
)

# Test queries
query1 = "What is the current price of Bitcoin in USD?"
result1 = agent_executor.invoke({"input": query1})
print(result1["output"])

query2 = "If Bitcoin is $45,000 and I buy 0.5 BTC, how much do I pay?"
result2 = agent_executor.invoke({"input": query2})
print(result2["output"])

# Expected reasoning:
# Thought: I need to search for the current Bitcoin price
# Action: Search
# Action Input: current bitcoin price USD
# Observation: Bitcoin is trading at $45,123...
# Thought: Now I need to calculate 45000 * 0.5
# Action: Calculator
# Action Input: 45000 * 0.5
# Observation: 22500.0
# Thought: I now know the final answer
# Final Answer: You would pay $22,500 for 0.5 BTC at $45,000 per coin.

What is LangGraph?

LangGraph is a library built on top of LangChain for creating stateful, multi-step agent workflows with cyclic graphs. Released in 2024, it solves a key limitation of basic LangChain chains: they are acyclic (one-way flows). LangGraph lets you build workflows where agents can loop, retry, branch conditionally, and maintain persistent state.

Why LangGraph?

Basic LangChain chains are linear: input → step1 → step2 → output. But many real-world use cases need:

Cyclic workflows: Agent tries, evaluates result, retries if needed
Conditional branching: Route to different sub-agents based on input type
Human-in-the-loop: Pause for approval before executing actions
Persistent state: Save conversation state to resume later
Multi-agent coordination: Multiple specialized agents collaborate

LangGraph represents workflows as directed graphs where nodes are functions (agents, tools, prompts) and edges define transitions. It supports checkpointing (save/resume state) and time travel debugging.

LangChain vs LangGraph: When to Use Each

Use Case	LangChain (Chains/Agents)	LangGraph
Simple RAG Q&A	✅ Perfect fit	Overkill
Linear workflow (retrieve → generate)	✅ Use LCEL chains	Unnecessary complexity
Agent with retry logic	❌ Hard to implement	✅ Native support
Multi-step research (search → analyze → summarize → refine)	⚠️ Possible but messy	✅ Clean graph structure
Human approval before action	❌ No native support	✅ Interrupt nodes
Conversation that needs to pause/resume	⚠️ Manual state management	✅ Checkpointing
Routing between specialized agents	⚠️ Custom logic required	✅ Conditional edges

LangGraph Code Examples

Example 1: Research Agent with Self-Reflection

This agent searches the web, analyzes results, and reflects on whether the answer is good enough. If not, it searches again with a refined query.

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from typing import TypedDict, List

# Define state
class AgentState(TypedDict):
    question: str
    search_query: str
    search_results: str
    answer: str
    confidence: str  # "high" or "low"
    iteration: int

# Initialize tools and LLM
search_tool = DuckDuckGoSearchRun()
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Node 1: Generate search query
def generate_query(state: AgentState) -> AgentState:
    question = state["question"]
    iteration = state.get("iteration", 0)

    if iteration == 0:
        query = question
    else:
        # Refine query based on previous attempt
        prompt = f"""Previous search for "{state['search_query']}" didn't give a confident answer.
Generate a more specific search query for: {question}"""
        query = llm.invoke(prompt).content

    return {**state, "search_query": query, "iteration": iteration + 1}

# Node 2: Execute search
def search(state: AgentState) -> AgentState:
    results = search_tool.run(state["search_query"])
    return {**state, "search_results": results}

# Node 3: Generate answer and self-evaluate
def generate_answer(state: AgentState) -> AgentState:
    prompt = f"""Based on these search results, answer the question.
Then evaluate your confidence (high/low).

Question: {state['question']}
Search Results: {state['search_results']}

Format:
Answer: [your answer]
Confidence: [high or low]"""

    response = llm.invoke(prompt).content

    # Parse response
    lines = response.split("\n")
    answer = next((l.replace("Answer:", "").strip() for l in lines if "Answer:" in l), "")
    confidence = next((l.replace("Confidence:", "").strip().lower() for l in lines if "Confidence:" in l), "low")

    return {**state, "answer": answer, "confidence": confidence}

# Decision function: retry or finish?
def should_retry(state: AgentState) -> str:
    if state["confidence"] == "high" or state["iteration"] >= 3:
        return "finish"
    return "retry"

# Build graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("generate_query", generate_query)
workflow.add_node("search", search)
workflow.add_node("generate_answer", generate_answer)

# Add edges
workflow.set_entry_point("generate_query")
workflow.add_edge("generate_query", "search")
workflow.add_edge("search", "generate_answer")
workflow.add_conditional_edges(
    "generate_answer",
    should_retry,
    {
        "retry": "generate_query",  # Loop back
        "finish": END
    }
)

# Compile graph
app = workflow.compile()

# Run
result = app.invoke({
    "question": "What is the latest version of Python as of 2026?",
    "iteration": 0
})

print(f"Final Answer: {result['answer']}")
print(f"Confidence: {result['confidence']}")
print(f"Iterations: {result['iteration']}")

# Expected flow:
# 1. Generate query: "latest Python version 2026"
# 2. Search → results
# 3. Generate answer → confidence: low (vague results)
# 4. Retry: Generate refined query: "Python 3.13 release date 2026"
# 5. Search → better results
# 6. Generate answer → confidence: high
# 7. Finish

Example 2: Human-in-the-Loop Approval Workflow

This agent drafts an email, asks for human approval, and only sends if approved.

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver  # For interrupts
from langchain_openai import ChatOpenAI
from typing import TypedDict

class EmailState(TypedDict):
    recipient: str
    topic: str
    draft: str
    approved: bool

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)

# Node 1: Draft email
def draft_email(state: EmailState) -> EmailState:
    prompt = f"Write a professional email to {state['recipient']} about {state['topic']}."
    draft = llm.invoke(prompt).content
    return {**state, "draft": draft}

# Node 2: Send email (only if approved)
def send_email(state: EmailState) -> EmailState:
    print(f"📧 Sending email to {state['recipient']}:")
    print(state['draft'])
    return state

# Decision: check approval
def check_approval(state: EmailState) -> str:
    return "send" if state.get("approved", False) else END

# Build graph with checkpointing
workflow = StateGraph(EmailState)
workflow.add_node("draft", draft_email)
workflow.add_node("send", send_email)

workflow.set_entry_point("draft")
workflow.add_conditional_edges(
    "draft",
    check_approval,
    {"send": "send", END: END}
)
workflow.add_edge("send", END)

# Compile with memory (required for interrupts)
memory = MemorySaver()
app = workflow.compile(checkpointer=memory, interrupt_before=["send"])

# Run with a thread_id (required for stateful execution)
config = {"configurable": {"thread_id": "email-123"}}

# Step 1: Draft email
result1 = app.invoke({
    "recipient": "john@example.com",
    "topic": "Q1 2026 product roadmap"
}, config)

print("Draft created:")
print(result1["draft"])
print("\n⏸️  Workflow paused. Review and approve.")

# --- Human reviews draft here ---

# Step 2: Approve and continue
result2 = app.invoke({
    **result1,
    "approved": True  # Human approval
}, config)

# Output:
# Draft created:
# Subject: Q1 2026 Product Roadmap
#
# Dear John,
#
# I wanted to share our product roadmap for Q1 2026...
#
# ⏸️  Workflow paused. Review and approve.
# 📧 Sending email to john@example.com:
# [email content]

Example 3: Multi-Agent Collaboration (Researcher + Writer)

Two specialized agents collaborate: one researches, the other writes a report.

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from typing import TypedDict, List

class ResearchState(TypedDict):
    topic: str
    research_notes: str
    article: str

search_tool = DuckDuckGoSearchRun()
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

# Agent 1: Researcher
def research(state: ResearchState) -> ResearchState:
    topic = state["topic"]

    # Search for information
    search_results = search_tool.run(f"{topic} latest developments 2026")

    # Analyze and summarize
    prompt = f"""Analyze these search results and extract key facts about {topic}.
Focus on: definitions, recent developments, expert opinions, statistics.

Search Results:
{search_results}

Output format:
- Key fact 1
- Key fact 2
..."""

    notes = llm.invoke(prompt).content
    return {**state, "research_notes": notes}

# Agent 2: Writer
def write_article(state: ResearchState) -> ResearchState:
    prompt = f"""Write a 300-word article about {state['topic']}.
Use these research notes as your source:

{state['research_notes']}

Write in a professional, engaging style. Include an introduction, key points, and conclusion."""

    article = llm.invoke(prompt).content
    return {**state, "article": article}

# Build graph
workflow = StateGraph(ResearchState)
workflow.add_node("researcher", research)
workflow.add_node("writer", write_article)

workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)

app = workflow.compile()

# Run
result = app.invoke({"topic": "Impact of AI Act regulations on European startups"})

print("Research Notes:")
print(result["research_notes"])
print("\n" + "="*50 + "\n")
print("Article:")
print(result["article"])

Production Best Practices

1. Cost Optimization

Use cheaper models for simple tasks: gpt-4o-mini ($0.15/1M tokens) instead of gpt-4o ($5/1M tokens) for RAG generation
Cache embeddings: Store embeddings for 30 days to avoid re-computing for same queries
Batch API calls: Use OpenAI batch API for non-urgent tasks (50% discount)
Limit context window: Use ConversationBufferWindowMemory instead of full history
Consider local LLMs: Llama 3.3 70B via Ollama costs $0 per query (requires GPU)

2. Monitoring with LangSmith

LangSmith is LangChain's observability platform. Enable it with 2 environment variables:

import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"

# Now all chains/agents auto-log to LangSmith
# View traces at: https://smith.langchain.com/

What you get:

Full trace of every chain execution (inputs, outputs, latencies)
Token usage and cost tracking per call
Error analysis and failure patterns
A/B testing of prompts and models
Dataset creation from production logs

3. Error Handling and Retries

from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableRetry

llm = ChatOpenAI(model="gpt-4o-mini")

# Add automatic retries (up to 3 attempts)
llm_with_retry = RunnableRetry(
    bound=llm,
    max_attempts=3,
    wait_exponential_jitter=True  # Exponential backoff
)

# Use in chain
chain = prompt | llm_with_retry | output_parser

# For agents: add fallback LLM
from langchain_anthropic import ChatAnthropic

llm_primary = ChatOpenAI(model="gpt-4o")
llm_fallback = ChatAnthropic(model="claude-3-5-sonnet-20241022")

llm = llm_primary.with_fallbacks([llm_fallback])
# If OpenAI fails, automatically tries Claude

4. Streaming for Better UX

Always stream responses in user-facing applications:

# FastAPI endpoint example
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

app = FastAPI()

@app.post("/chat")
async def chat(question: str):
    llm = ChatOpenAI(model="gpt-4o-mini", streaming=True)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant."),
        ("human", "{question}")
    ])
    chain = prompt | llm

    async def generate():
        async for chunk in chain.astream({"question": question}):
            yield chunk.content

    return StreamingResponse(generate(), media_type="text/plain")

# Client receives tokens as they're generated (feels instant)

5. Security Best Practices

Never put API keys in code: Use environment variables or secret managers
Validate user inputs: Sanitize queries before passing to LLMs (prevent prompt injection)
Rate limit agents: Set max_iterations to prevent infinite loops
Restrict tool access: Only give agents access to necessary tools (principle of least privilege)
Use constitutional AI: Add a moderation layer to filter harmful outputs

Common Pitfalls and How to Avoid Them

Pitfall	Consequence	Solution
Not limiting agent iterations	Infinite loops, high costs	Set max_iterations=10 in AgentExecutor
Using full conversation history	Context window exceeded, high token costs	Use ConversationSummaryMemory or BufferWindowMemory(k=10)
Not caching embeddings	Repeated embedding costs for same documents	Use CacheBackedEmbeddings with LocalFileStore or Redis
Synchronous API calls	Slow response times	Use chain.ainvoke() or chain.astream() (async versions)
Not handling API errors	Application crashes on rate limits	Use RunnableRetry with exponential backoff
Using gpt-4 for everything	10x higher costs vs gpt-4o-mini	Reserve gpt-4o for complex reasoning, use mini for RAG/classification
No observability	Can't debug production issues	Enable LangSmith tracing from day 1

Real-World Architecture Example

Here's a production-ready architecture for a customer support chatbot with RAG, using LangChain + LangGraph:

# File: production_chatbot.py
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.postgres import PostgresSaver  # Persistent state
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Pinecone
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from typing import TypedDict
import pinecone

# State definition
class ChatState(TypedDict):
    user_id: str
    message: str
    intent: str  # "faq", "billing", "technical", "human_handoff"
    response: str
    conversation_history: str

# Initialize components
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Vector store (docs indexed beforehand)
pinecone.init(api_key="your-key", environment="us-west1-gcp")
vectorstore = Pinecone.from_existing_index("support-docs", embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# RAG chain
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# Node 1: Classify intent
def classify_intent(state: ChatState) -> ChatState:
    prompt = f"""Classify the user's intent into one of these categories:
- faq: General questions about the product
- billing: Payment, invoices, subscriptions
- technical: Bug reports, technical issues
- human_handoff: Urgent or complex issues requiring human agent

User message: {state['message']}
Conversation history: {state.get('conversation_history', 'None')}

Respond with just the category name."""

    intent = llm.invoke(prompt).content.strip().lower()
    return {**state, "intent": intent}

# Node 2: Handle FAQ (RAG)
def handle_faq(state: ChatState) -> ChatState:
    result = rag_chain.invoke({"query": state["message"]})
    response = result["result"]

    # Add sources
    sources = [doc.metadata.get("source", "Unknown") for doc in result["source_documents"]]
    response += f"\n\nSources: {', '.join(sources)}"

    return {**state, "response": response}

# Node 3: Handle billing (hypothetical)
def handle_billing(state: ChatState) -> ChatState:
    # In reality: query billing API, return account info
    response = "I've pulled up your billing information. Your next invoice is due on April 15, 2026."
    return {**state, "response": response}

# Node 4: Human handoff
def human_handoff(state: ChatState) -> ChatState:
    # In reality: create support ticket, notify team
    response = "I'm connecting you with a human agent. Average wait time: 2 minutes."
    return {**state, "response": response}

# Routing function
def route_intent(state: ChatState) -> str:
    intent_map = {
        "faq": "faq",
        "billing": "billing",
        "technical": "human",
        "human_handoff": "human"
    }
    return intent_map.get(state["intent"], "faq")

# Build graph
workflow = StateGraph(ChatState)
workflow.add_node("classify", classify_intent)
workflow.add_node("faq", handle_faq)
workflow.add_node("billing", handle_billing)
workflow.add_node("human", human_handoff)

workflow.set_entry_point("classify")
workflow.add_conditional_edges(
    "classify",
    route_intent,
    {"faq": "faq", "billing": "billing", "human": "human"}
)
workflow.add_edge("faq", END)
workflow.add_edge("billing", END)
workflow.add_edge("human", END)

# Compile with PostgreSQL checkpointing for persistence
checkpointer = PostgresSaver.from_conn_string("postgresql://user:pass@host:5432/db")
app = workflow.compile(checkpointer=checkpointer)

# API endpoint (FastAPI)
from fastapi import FastAPI
api = FastAPI()

@api.post("/chat")
async def chat(user_id: str, message: str):
    config = {"configurable": {"thread_id": f"user-{user_id}"}}

    result = await app.ainvoke({
        "user_id": user_id,
        "message": message
    }, config)

    return {"response": result["response"], "intent": result["intent"]}

# Deployment: Docker + AWS ECS Fargate
# Horizontal scaling: multiple containers, state in Postgres
# Monitoring: LangSmith + Datadog

Learning Resources

To master LangChain and LangGraph, we recommend:

Official LangChain documentation: python.langchain.com/docs (comprehensive API reference)
LangGraph documentation: langchain-ai.github.io/langgraph (tutorials and examples)
LangChain Academy: Free course by Harrison Chase (creator) covering fundamentals to advanced agents
LangSmith Cookbook: Production patterns and best practices

For hands-on professional training, Talki Academy offers:

RAG and Agents in Production (3-day intensive): Build production RAG systems with LangChain, LlamaIndex, and LangGraph. Includes real-world projects and deployment to AWS.
Claude API for Developers (2 days): Master Claude 4.5 with LangChain integration. Advanced prompt engineering, function calling, and cost optimization.

Frequently Asked Questions

When should I use LangGraph instead of basic LangChain?

Use LangGraph when you need: (1) Complex multi-step workflows with conditional branching, (2) Agents that need to revise their work based on feedback, (3) Persistent state across conversation turns, (4) Human-in-the-loop approvals, or (5) Cyclic workflows (agent tries, evaluates, retries). For simple linear chains (retrieve → generate), basic LangChain is sufficient.

Can I use LangChain with open-source LLMs like Llama or Mistral?

Yes, absolutely. LangChain has native integrations with Ollama (local inference), vLLM (GPU-accelerated serving), HuggingFace Transformers, and OpenAI-compatible API servers. Example: ChatOllama(model='llama3.3:70b') works identically to ChatOpenAI. Same for embeddings: OllamaEmbeddings(model='nomic-embed-text') replaces OpenAIEmbeddings. This lets you run everything locally with zero API costs.

What's the real production cost of a LangChain RAG application?

For a typical customer support RAG with 50k queries/month: ~$400-600/month (OpenAI GPT-4o mini + embeddings + Pinecone vector DB + compute). You can reduce this 60% by using local Llama 3.3 70B ($0 LLM calls, +$200/month GPU) and self-hosted ChromaDB ($0 vector DB). Total optimized cost: ~$250/month. Key savings: cache embeddings (30-day TTL), use gpt-4o-mini not gpt-4, batch queries where possible.

How do I add memory to a LangChain chatbot?

LangChain offers 4 memory strategies: (1) ConversationBufferMemory (stores all messages, simple but memory grows), (2) ConversationBufferWindowMemory (last N messages, good for long chats), (3) ConversationSummaryMemory (LLM summarizes old messages, best for production), (4) ConversationKGMemory (knowledge graph of facts). For most apps: start with BufferWindowMemory(k=10), upgrade to SummaryMemory when context limits hit. With LangGraph, use checkpoints for persistent state across sessions.

What's the difference between LangChain Expression Language (LCEL) and the old Chain API?

LCEL (introduced 2023) is the modern way to build chains using the pipe operator (|). Advantages: (1) streaming by default, (2) async/sync auto-handled, (3) better debugging, (4) LangSmith integration. Old Chain API (RetrievalQA, etc.) still works but is maintenance mode. Example: retriever | prompt | llm | output_parser (LCEL) vs RetrievalQA.from_chain_type() (old). Recommendation: use LCEL for new projects, migrate old chains progressively.