Talki Academy
Technical28 min read

LangChain and LangGraph: Practical Guide for Building AI Agents in 2026

LangChain has become the de facto standard for building LLM applications, with 85k+ GitHub stars and adoption by thousands of companies worldwide. This practical guide covers LangChain fundamentals, introduces LangGraph for complex stateful workflows, and provides working code examples for real-world use cases: chatbots with memory, RAG systems, multi-step research agents, and human-in-the-loop approval workflows. Includes architecture patterns, production best practices, and cost optimization strategies.

By Talki Academy·Updated April 5, 2026

What is LangChain?

LangChain is an open-source Python (and TypeScript) framework designed to simplify building applications powered by Large Language Models (LLMs). Instead of writing low-level API calls and managing prompt templates manually, LangChain provides composable abstractions for common patterns: retrieving documents, maintaining conversation memory, calling external tools, and chaining multiple LLM calls together.

Released in October 2022 by Harrison Chase, LangChain quickly became the most popular LLM framework, with 85,000+ GitHub stars and a thriving ecosystem of integrations (50+ vector databases, 100+ document loaders, 20+ LLM providers).

Core Concepts

LangChain is built around five key abstractions:

  • Models: Wrappers for LLMs (OpenAI, Claude, Llama) and embeddings (text-embedding-3-small, nomic-embed-text)
  • Prompts: Templates for structuring inputs to LLMs with variable substitution
  • Chains: Sequences of calls (e.g., retrieve documents → format prompt → generate answer)
  • Memory: State management for multi-turn conversations
  • Agents: LLMs that decide which tools to call and when

Why Use LangChain?

BenefitWithout LangChainWith LangChain
RAG system200+ lines of boilerplate (vector DB, embeddings, retrieval, prompt formatting)30 lines using RetrievalQA or LCEL
Conversation memoryManual session storage, context window management, summarization logicConversationBufferMemory or ConversationSummaryMemory (5 lines)
Agent with toolsCustom ReAct loop, function calling parsing, error handling, retry logiccreate_react_agent() + tool decorators (20 lines)
Switching LLM providersRewrite API calls, adapt to different response formatsChange one line: ChatOpenAI() → ChatAnthropic()
ObservabilityCustom logging, metrics, tracing infrastructureLangSmith integration (2 env vars)

Common Use Cases and Code Examples

Use Case 1: Simple Q&A Chatbot

The simplest LangChain application: send a question, get an answer. This is useful for stateless applications like one-off queries or API endpoints.

from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser # 1. Initialize LLM llm = ChatOpenAI( model="gpt-4o-mini", temperature=0.7, api_key="your-api-key" ) # 2. Create prompt template prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant that answers questions concisely."), ("human", "{question}") ]) # 3. Create chain using LCEL (LangChain Expression Language) chain = prompt | llm | StrOutputParser() # 4. Invoke response = chain.invoke({"question": "What is the capital of France?"}) print(response) # "Paris is the capital of France." # For streaming responses: for chunk in chain.stream({"question": "Explain quantum computing in 3 sentences"}): print(chunk, end="", flush=True)

Key concepts:

  • The | operator creates a chain (LCEL syntax)
  • StrOutputParser() extracts text from the LLM response
  • chain.stream() enables token-by-token streaming (crucial for UX)

Use Case 2: Chatbot with Conversation Memory

Most chatbots need to remember previous messages. LangChain handles this with Memory components.

from langchain_openai import ChatOpenAI from langchain.chains import ConversationChain from langchain.memory import ConversationBufferMemory # Initialize LLM llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) # Create memory (stores full conversation history) memory = ConversationBufferMemory() # Create conversation chain conversation = ConversationChain( llm=llm, memory=memory, verbose=True # Shows prompt sent to LLM ) # Multi-turn conversation response1 = conversation.predict(input="My name is Alice and I love Python.") print(response1) # "Nice to meet you, Alice! Python is a great language..." response2 = conversation.predict(input="What's my name?") print(response2) # "Your name is Alice." response3 = conversation.predict(input="What programming language do I like?") print(response3) # "You love Python." # Inspect memory print(conversation.memory.buffer) # Shows full conversation history

Production tip: For long conversations, use ConversationBufferWindowMemory(k=10) to keep only the last 10 messages, or ConversationSummaryMemory to have the LLM summarize old messages (reduces token costs).

Use Case 3: RAG (Retrieval-Augmented Generation)

RAG lets you query private documents without fine-tuning. Here's a complete example: loading PDFs, creating embeddings, storing in a vector database, and answering questions.

from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_openai import OpenAIEmbeddings, ChatOpenAI from langchain_community.vectorstores import Chroma from langchain.chains import RetrievalQA # Step 1: Load documents loader = PyPDFLoader("company_documentation.pdf") documents = loader.load() print(f"Loaded {len(documents)} pages") # Step 2: Split into chunks text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, # Characters per chunk chunk_overlap=200, # Overlap to preserve context separators=["\n\n", "\n", " ", ""] ) chunks = text_splitter.split_documents(documents) print(f"Split into {len(chunks)} chunks") # Step 3: Create embeddings and store in vector DB embeddings = OpenAIEmbeddings(model="text-embedding-3-small") vectorstore = Chroma.from_documents( documents=chunks, embedding=embeddings, persist_directory="./chroma_db" # Save to disk ) # Step 4: Create retriever retriever = vectorstore.as_retriever( search_type="similarity", search_kwargs={"k": 4} # Retrieve top 4 most relevant chunks ) # Step 5: Create RAG chain llm = ChatOpenAI(model="gpt-4o-mini", temperature=0) qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", # "stuff" = insert all docs into one prompt retriever=retriever, return_source_documents=True ) # Step 6: Query query = "What is the company's return policy?" result = qa_chain.invoke({"query": query}) print(f"Answer: {result['result']}") print(f"\nSources ({len(result['source_documents'])} documents):") for i, doc in enumerate(result['source_documents']): print(f" [{i+1}] Page {doc.metadata.get('page', 'N/A')}") print(f" {doc.page_content[:150]}...") # Expected output: # Answer: The company offers a 30-day return policy for unused products... # Sources (4 documents): # [1] Page 12 # Returns must be initiated within 30 days of purchase...

Alternative: Modern LCEL syntax

from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser # Custom prompt for better control template = """Answer the question based on the following context. If you don't know, say "I don't know" - don't make up information. Context: {context} Question: {question} Answer:""" prompt = ChatPromptTemplate.from_template(template) # Create RAG chain using LCEL rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # Query answer = rag_chain.invoke("What is the company's return policy?") print(answer)

Use Case 4: Agent with Tools (Web Search + Calculator)

Agents let the LLM decide which tools to use and when. This example creates an agent that can search the web and perform calculations.

from langchain_openai import ChatOpenAI from langchain.agents import Tool, create_react_agent, AgentExecutor from langchain_community.tools import DuckDuckGoSearchRun from langchain_core.prompts import PromptTemplate import math # Initialize LLM llm = ChatOpenAI(model="gpt-4o", temperature=0) # gpt-4o for better reasoning # Define tools search_tool = DuckDuckGoSearchRun() def calculator(expression: str) -> str: """Evaluates a mathematical expression. Example: '2 + 2' or 'sqrt(16)'""" try: # Safe eval with math functions result = eval(expression, {"__builtins__": {}}, vars(math)) return str(result) except Exception as e: return f"Error: {str(e)}" tools = [ Tool( name="Search", func=search_tool.run, description="Useful for finding current information on the internet. Input should be a search query." ), Tool( name="Calculator", func=calculator, description="Useful for mathematical calculations. Input should be a valid Python expression like '2 + 2' or 'sqrt(16) * 3'." ) ] # Create ReAct agent prompt = PromptTemplate.from_template("""Answer the following question as best you can. You have access to the following tools: {tools} Use the following format: Question: the input question you must answer Thought: think about what to do Action: the action to take, should be one of [{tool_names}] Action Input: the input to the action Observation: the result of the action ... (this Thought/Action/Action Input/Observation can repeat N times) Thought: I now know the final answer Final Answer: the final answer to the original input question Begin! Question: {input} Thought: {agent_scratchpad}""") agent = create_react_agent(llm, tools, prompt) agent_executor = AgentExecutor( agent=agent, tools=tools, verbose=True, # Show reasoning steps max_iterations=10, handle_parsing_errors=True ) # Test queries query1 = "What is the current price of Bitcoin in USD?" result1 = agent_executor.invoke({"input": query1}) print(result1["output"]) query2 = "If Bitcoin is $45,000 and I buy 0.5 BTC, how much do I pay?" result2 = agent_executor.invoke({"input": query2}) print(result2["output"]) # Expected reasoning: # Thought: I need to search for the current Bitcoin price # Action: Search # Action Input: current bitcoin price USD # Observation: Bitcoin is trading at $45,123... # Thought: Now I need to calculate 45000 * 0.5 # Action: Calculator # Action Input: 45000 * 0.5 # Observation: 22500.0 # Thought: I now know the final answer # Final Answer: You would pay $22,500 for 0.5 BTC at $45,000 per coin.

What is LangGraph?

LangGraph is a library built on top of LangChain for creating stateful, multi-step agent workflows with cyclic graphs. Released in 2024, it solves a key limitation of basic LangChain chains: they are acyclic (one-way flows). LangGraph lets you build workflows where agents can loop, retry, branch conditionally, and maintain persistent state.

Why LangGraph?

Basic LangChain chains are linear: input → step1 → step2 → output. But many real-world use cases need:

  • Cyclic workflows: Agent tries, evaluates result, retries if needed
  • Conditional branching: Route to different sub-agents based on input type
  • Human-in-the-loop: Pause for approval before executing actions
  • Persistent state: Save conversation state to resume later
  • Multi-agent coordination: Multiple specialized agents collaborate

LangGraph represents workflows as directed graphs where nodes are functions (agents, tools, prompts) and edges define transitions. It supports checkpointing (save/resume state) and time travel debugging.

LangChain vs LangGraph: When to Use Each

Use CaseLangChain (Chains/Agents)LangGraph
Simple RAG Q&A✅ Perfect fitOverkill
Linear workflow (retrieve → generate)✅ Use LCEL chainsUnnecessary complexity
Agent with retry logic❌ Hard to implement✅ Native support
Multi-step research (search → analyze → summarize → refine)⚠️ Possible but messy✅ Clean graph structure
Human approval before action❌ No native support✅ Interrupt nodes
Conversation that needs to pause/resume⚠️ Manual state management✅ Checkpointing
Routing between specialized agents⚠️ Custom logic required✅ Conditional edges

LangGraph Code Examples

Example 1: Research Agent with Self-Reflection

This agent searches the web, analyzes results, and reflects on whether the answer is good enough. If not, it searches again with a refined query.

from langgraph.graph import StateGraph, END from langchain_openai import ChatOpenAI from langchain_community.tools import DuckDuckGoSearchRun from typing import TypedDict, List # Define state class AgentState(TypedDict): question: str search_query: str search_results: str answer: str confidence: str # "high" or "low" iteration: int # Initialize tools and LLM search_tool = DuckDuckGoSearchRun() llm = ChatOpenAI(model="gpt-4o", temperature=0) # Node 1: Generate search query def generate_query(state: AgentState) -> AgentState: question = state["question"] iteration = state.get("iteration", 0) if iteration == 0: query = question else: # Refine query based on previous attempt prompt = f"""Previous search for "{state['search_query']}" didn't give a confident answer. Generate a more specific search query for: {question}""" query = llm.invoke(prompt).content return {**state, "search_query": query, "iteration": iteration + 1} # Node 2: Execute search def search(state: AgentState) -> AgentState: results = search_tool.run(state["search_query"]) return {**state, "search_results": results} # Node 3: Generate answer and self-evaluate def generate_answer(state: AgentState) -> AgentState: prompt = f"""Based on these search results, answer the question. Then evaluate your confidence (high/low). Question: {state['question']} Search Results: {state['search_results']} Format: Answer: [your answer] Confidence: [high or low]""" response = llm.invoke(prompt).content # Parse response lines = response.split("\n") answer = next((l.replace("Answer:", "").strip() for l in lines if "Answer:" in l), "") confidence = next((l.replace("Confidence:", "").strip().lower() for l in lines if "Confidence:" in l), "low") return {**state, "answer": answer, "confidence": confidence} # Decision function: retry or finish? def should_retry(state: AgentState) -> str: if state["confidence"] == "high" or state["iteration"] >= 3: return "finish" return "retry" # Build graph workflow = StateGraph(AgentState) # Add nodes workflow.add_node("generate_query", generate_query) workflow.add_node("search", search) workflow.add_node("generate_answer", generate_answer) # Add edges workflow.set_entry_point("generate_query") workflow.add_edge("generate_query", "search") workflow.add_edge("search", "generate_answer") workflow.add_conditional_edges( "generate_answer", should_retry, { "retry": "generate_query", # Loop back "finish": END } ) # Compile graph app = workflow.compile() # Run result = app.invoke({ "question": "What is the latest version of Python as of 2026?", "iteration": 0 }) print(f"Final Answer: {result['answer']}") print(f"Confidence: {result['confidence']}") print(f"Iterations: {result['iteration']}") # Expected flow: # 1. Generate query: "latest Python version 2026" # 2. Search → results # 3. Generate answer → confidence: low (vague results) # 4. Retry: Generate refined query: "Python 3.13 release date 2026" # 5. Search → better results # 6. Generate answer → confidence: high # 7. Finish

Example 2: Human-in-the-Loop Approval Workflow

This agent drafts an email, asks for human approval, and only sends if approved.

from langgraph.graph import StateGraph, END from langgraph.checkpoint.memory import MemorySaver # For interrupts from langchain_openai import ChatOpenAI from typing import TypedDict class EmailState(TypedDict): recipient: str topic: str draft: str approved: bool llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) # Node 1: Draft email def draft_email(state: EmailState) -> EmailState: prompt = f"Write a professional email to {state['recipient']} about {state['topic']}." draft = llm.invoke(prompt).content return {**state, "draft": draft} # Node 2: Send email (only if approved) def send_email(state: EmailState) -> EmailState: print(f"📧 Sending email to {state['recipient']}:") print(state['draft']) return state # Decision: check approval def check_approval(state: EmailState) -> str: return "send" if state.get("approved", False) else END # Build graph with checkpointing workflow = StateGraph(EmailState) workflow.add_node("draft", draft_email) workflow.add_node("send", send_email) workflow.set_entry_point("draft") workflow.add_conditional_edges( "draft", check_approval, {"send": "send", END: END} ) workflow.add_edge("send", END) # Compile with memory (required for interrupts) memory = MemorySaver() app = workflow.compile(checkpointer=memory, interrupt_before=["send"]) # Run with a thread_id (required for stateful execution) config = {"configurable": {"thread_id": "email-123"}} # Step 1: Draft email result1 = app.invoke({ "recipient": "john@example.com", "topic": "Q1 2026 product roadmap" }, config) print("Draft created:") print(result1["draft"]) print("\n⏸️ Workflow paused. Review and approve.") # --- Human reviews draft here --- # Step 2: Approve and continue result2 = app.invoke({ **result1, "approved": True # Human approval }, config) # Output: # Draft created: # Subject: Q1 2026 Product Roadmap # # Dear John, # # I wanted to share our product roadmap for Q1 2026... # # ⏸️ Workflow paused. Review and approve. # 📧 Sending email to john@example.com: # [email content]

Example 3: Multi-Agent Collaboration (Researcher + Writer)

Two specialized agents collaborate: one researches, the other writes a report.

from langgraph.graph import StateGraph, END from langchain_openai import ChatOpenAI from langchain_community.tools import DuckDuckGoSearchRun from typing import TypedDict, List class ResearchState(TypedDict): topic: str research_notes: str article: str search_tool = DuckDuckGoSearchRun() llm = ChatOpenAI(model="gpt-4o", temperature=0.7) # Agent 1: Researcher def research(state: ResearchState) -> ResearchState: topic = state["topic"] # Search for information search_results = search_tool.run(f"{topic} latest developments 2026") # Analyze and summarize prompt = f"""Analyze these search results and extract key facts about {topic}. Focus on: definitions, recent developments, expert opinions, statistics. Search Results: {search_results} Output format: - Key fact 1 - Key fact 2 ...""" notes = llm.invoke(prompt).content return {**state, "research_notes": notes} # Agent 2: Writer def write_article(state: ResearchState) -> ResearchState: prompt = f"""Write a 300-word article about {state['topic']}. Use these research notes as your source: {state['research_notes']} Write in a professional, engaging style. Include an introduction, key points, and conclusion.""" article = llm.invoke(prompt).content return {**state, "article": article} # Build graph workflow = StateGraph(ResearchState) workflow.add_node("researcher", research) workflow.add_node("writer", write_article) workflow.set_entry_point("researcher") workflow.add_edge("researcher", "writer") workflow.add_edge("writer", END) app = workflow.compile() # Run result = app.invoke({"topic": "Impact of AI Act regulations on European startups"}) print("Research Notes:") print(result["research_notes"]) print("\n" + "="*50 + "\n") print("Article:") print(result["article"])

Production Best Practices

1. Cost Optimization

  • Use cheaper models for simple tasks: gpt-4o-mini ($0.15/1M tokens) instead of gpt-4o ($5/1M tokens) for RAG generation
  • Cache embeddings: Store embeddings for 30 days to avoid re-computing for same queries
  • Batch API calls: Use OpenAI batch API for non-urgent tasks (50% discount)
  • Limit context window: Use ConversationBufferWindowMemory instead of full history
  • Consider local LLMs: Llama 3.3 70B via Ollama costs $0 per query (requires GPU)

2. Monitoring with LangSmith

LangSmith is LangChain's observability platform. Enable it with 2 environment variables:

import os os.environ["LANGCHAIN_TRACING_V2"] = "true" os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key" # Now all chains/agents auto-log to LangSmith # View traces at: https://smith.langchain.com/

What you get:

  • Full trace of every chain execution (inputs, outputs, latencies)
  • Token usage and cost tracking per call
  • Error analysis and failure patterns
  • A/B testing of prompts and models
  • Dataset creation from production logs

3. Error Handling and Retries

from langchain_openai import ChatOpenAI from langchain_core.runnables import RunnableRetry llm = ChatOpenAI(model="gpt-4o-mini") # Add automatic retries (up to 3 attempts) llm_with_retry = RunnableRetry( bound=llm, max_attempts=3, wait_exponential_jitter=True # Exponential backoff ) # Use in chain chain = prompt | llm_with_retry | output_parser # For agents: add fallback LLM from langchain_anthropic import ChatAnthropic llm_primary = ChatOpenAI(model="gpt-4o") llm_fallback = ChatAnthropic(model="claude-3-5-sonnet-20241022") llm = llm_primary.with_fallbacks([llm_fallback]) # If OpenAI fails, automatically tries Claude

4. Streaming for Better UX

Always stream responses in user-facing applications:

# FastAPI endpoint example from fastapi import FastAPI from fastapi.responses import StreamingResponse from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate app = FastAPI() @app.post("/chat") async def chat(question: str): llm = ChatOpenAI(model="gpt-4o-mini", streaming=True) prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("human", "{question}") ]) chain = prompt | llm async def generate(): async for chunk in chain.astream({"question": question}): yield chunk.content return StreamingResponse(generate(), media_type="text/plain") # Client receives tokens as they're generated (feels instant)

5. Security Best Practices

  • Never put API keys in code: Use environment variables or secret managers
  • Validate user inputs: Sanitize queries before passing to LLMs (prevent prompt injection)
  • Rate limit agents: Set max_iterations to prevent infinite loops
  • Restrict tool access: Only give agents access to necessary tools (principle of least privilege)
  • Use constitutional AI: Add a moderation layer to filter harmful outputs

Common Pitfalls and How to Avoid Them

PitfallConsequenceSolution
Not limiting agent iterationsInfinite loops, high costsSet max_iterations=10 in AgentExecutor
Using full conversation historyContext window exceeded, high token costsUse ConversationSummaryMemory or BufferWindowMemory(k=10)
Not caching embeddingsRepeated embedding costs for same documentsUse CacheBackedEmbeddings with LocalFileStore or Redis
Synchronous API callsSlow response timesUse chain.ainvoke() or chain.astream() (async versions)
Not handling API errorsApplication crashes on rate limitsUse RunnableRetry with exponential backoff
Using gpt-4 for everything10x higher costs vs gpt-4o-miniReserve gpt-4o for complex reasoning, use mini for RAG/classification
No observabilityCan't debug production issuesEnable LangSmith tracing from day 1

Real-World Architecture Example

Here's a production-ready architecture for a customer support chatbot with RAG, using LangChain + LangGraph:

# File: production_chatbot.py from langgraph.graph import StateGraph, END from langgraph.checkpoint.postgres import PostgresSaver # Persistent state from langchain_openai import ChatOpenAI, OpenAIEmbeddings from langchain_community.vectorstores import Pinecone from langchain.chains import RetrievalQA from langchain.memory import ConversationSummaryMemory from typing import TypedDict import pinecone # State definition class ChatState(TypedDict): user_id: str message: str intent: str # "faq", "billing", "technical", "human_handoff" response: str conversation_history: str # Initialize components llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.7) embeddings = OpenAIEmbeddings(model="text-embedding-3-small") # Vector store (docs indexed beforehand) pinecone.init(api_key="your-key", environment="us-west1-gcp") vectorstore = Pinecone.from_existing_index("support-docs", embeddings) retriever = vectorstore.as_retriever(search_kwargs={"k": 3}) # RAG chain rag_chain = RetrievalQA.from_chain_type( llm=llm, retriever=retriever, return_source_documents=True ) # Node 1: Classify intent def classify_intent(state: ChatState) -> ChatState: prompt = f"""Classify the user's intent into one of these categories: - faq: General questions about the product - billing: Payment, invoices, subscriptions - technical: Bug reports, technical issues - human_handoff: Urgent or complex issues requiring human agent User message: {state['message']} Conversation history: {state.get('conversation_history', 'None')} Respond with just the category name.""" intent = llm.invoke(prompt).content.strip().lower() return {**state, "intent": intent} # Node 2: Handle FAQ (RAG) def handle_faq(state: ChatState) -> ChatState: result = rag_chain.invoke({"query": state["message"]}) response = result["result"] # Add sources sources = [doc.metadata.get("source", "Unknown") for doc in result["source_documents"]] response += f"\n\nSources: {', '.join(sources)}" return {**state, "response": response} # Node 3: Handle billing (hypothetical) def handle_billing(state: ChatState) -> ChatState: # In reality: query billing API, return account info response = "I've pulled up your billing information. Your next invoice is due on April 15, 2026." return {**state, "response": response} # Node 4: Human handoff def human_handoff(state: ChatState) -> ChatState: # In reality: create support ticket, notify team response = "I'm connecting you with a human agent. Average wait time: 2 minutes." return {**state, "response": response} # Routing function def route_intent(state: ChatState) -> str: intent_map = { "faq": "faq", "billing": "billing", "technical": "human", "human_handoff": "human" } return intent_map.get(state["intent"], "faq") # Build graph workflow = StateGraph(ChatState) workflow.add_node("classify", classify_intent) workflow.add_node("faq", handle_faq) workflow.add_node("billing", handle_billing) workflow.add_node("human", human_handoff) workflow.set_entry_point("classify") workflow.add_conditional_edges( "classify", route_intent, {"faq": "faq", "billing": "billing", "human": "human"} ) workflow.add_edge("faq", END) workflow.add_edge("billing", END) workflow.add_edge("human", END) # Compile with PostgreSQL checkpointing for persistence checkpointer = PostgresSaver.from_conn_string("postgresql://user:pass@host:5432/db") app = workflow.compile(checkpointer=checkpointer) # API endpoint (FastAPI) from fastapi import FastAPI api = FastAPI() @api.post("/chat") async def chat(user_id: str, message: str): config = {"configurable": {"thread_id": f"user-{user_id}"}} result = await app.ainvoke({ "user_id": user_id, "message": message }, config) return {"response": result["response"], "intent": result["intent"]} # Deployment: Docker + AWS ECS Fargate # Horizontal scaling: multiple containers, state in Postgres # Monitoring: LangSmith + Datadog

Learning Resources

To master LangChain and LangGraph, we recommend:

  • Official LangChain documentation: python.langchain.com/docs (comprehensive API reference)
  • LangGraph documentation: langchain-ai.github.io/langgraph (tutorials and examples)
  • LangChain Academy: Free course by Harrison Chase (creator) covering fundamentals to advanced agents
  • LangSmith Cookbook: Production patterns and best practices

For hands-on professional training, Talki Academy offers:

  • RAG and Agents in Production (3-day intensive): Build production RAG systems with LangChain, LlamaIndex, and LangGraph. Includes real-world projects and deployment to AWS.
  • Claude API for Developers (2 days): Master Claude 4.5 with LangChain integration. Advanced prompt engineering, function calling, and cost optimization.

Frequently Asked Questions

When should I use LangGraph instead of basic LangChain?

Use LangGraph when you need: (1) Complex multi-step workflows with conditional branching, (2) Agents that need to revise their work based on feedback, (3) Persistent state across conversation turns, (4) Human-in-the-loop approvals, or (5) Cyclic workflows (agent tries, evaluates, retries). For simple linear chains (retrieve → generate), basic LangChain is sufficient.

Can I use LangChain with open-source LLMs like Llama or Mistral?

Yes, absolutely. LangChain has native integrations with Ollama (local inference), vLLM (GPU-accelerated serving), HuggingFace Transformers, and OpenAI-compatible API servers. Example: ChatOllama(model='llama3.3:70b') works identically to ChatOpenAI. Same for embeddings: OllamaEmbeddings(model='nomic-embed-text') replaces OpenAIEmbeddings. This lets you run everything locally with zero API costs.

What's the real production cost of a LangChain RAG application?

For a typical customer support RAG with 50k queries/month: ~$400-600/month (OpenAI GPT-4o mini + embeddings + Pinecone vector DB + compute). You can reduce this 60% by using local Llama 3.3 70B ($0 LLM calls, +$200/month GPU) and self-hosted ChromaDB ($0 vector DB). Total optimized cost: ~$250/month. Key savings: cache embeddings (30-day TTL), use gpt-4o-mini not gpt-4, batch queries where possible.

How do I add memory to a LangChain chatbot?

LangChain offers 4 memory strategies: (1) ConversationBufferMemory (stores all messages, simple but memory grows), (2) ConversationBufferWindowMemory (last N messages, good for long chats), (3) ConversationSummaryMemory (LLM summarizes old messages, best for production), (4) ConversationKGMemory (knowledge graph of facts). For most apps: start with BufferWindowMemory(k=10), upgrade to SummaryMemory when context limits hit. With LangGraph, use checkpoints for persistent state across sessions.

What's the difference between LangChain Expression Language (LCEL) and the old Chain API?

LCEL (introduced 2023) is the modern way to build chains using the pipe operator (|). Advantages: (1) streaming by default, (2) async/sync auto-handled, (3) better debugging, (4) LangSmith integration. Old Chain API (RetrievalQA, etc.) still works but is maintenance mode. Example: retriever | prompt | llm | output_parser (LCEL) vs RetrievalQA.from_chain_type() (old). Recommendation: use LCEL for new projects, migrate old chains progressively.

Master LangChain and AI Agents

Professional training programs for developers and technical teams.

View Training ProgramsContact Us