Talki Academy
Technical15 min de lecture

Open Source AI Tools for Business: Ollama, LangChain, and n8n in 2026

Practical guide to the best open source AI tools for businesses in 2026. Ollama for local inference, LangChain for orchestration, n8n for automation. Working Python code examples included.

Par Talki Academy·Mis a jour le 6 avril 2026

Why open source is reshaping business AI

In 2022, integrating AI into a business almost always meant going through OpenAI or Google. By 2026, that picture has fundamentally changed. Mature open source tools — Ollama, LangChain, n8n — now let companies build AI systems comparable to proprietary solutions, at a fraction of the cost and with full control over their data.

Three structural advantages explain why open source AI adoption is accelerating:

  • Cost — No per-request billing. An Ollama + LangChain setup on a EUR 200/month server replaces API bills that could reach EUR 2,000–10,000/month for heavy usage.
  • Data privacy — Sensitive data never leaves your infrastructure. This is critical for legal, medical, and financial sectors, and for any business subject to GDPR with customer data constraints.
  • Flexibility — You choose the model, fine-tune it on your data, and deploy wherever you want. No vendor lock-in, no pricing surprises, no rate limits imposed by a third party.
Key insight: "Open source" no longer means "less capable." In 2026, Llama 3.3 70B and Mistral Large 2 match GPT-4o on most business tasks. The performance gap has narrowed dramatically — what remains is a cost and privacy gap, in open source's favor.

Ollama: run LLMs locally in minutes

Ollama is a tool that lets you download and run large language models directly on your machine or servers, without complex configuration. One command to install, one command to start a model. It exposes an OpenAI-compatible REST API, so existing integrations work out of the box.

Installation and first model

# Install (macOS / Linux) curl -fsSL https://ollama.com/install.sh | sh # Download and run Llama 3.2 (3B — fast, lightweight) ollama run llama3.2 # For more complex tasks: Llama 3.3 70B (requires 40GB+ VRAM) ollama pull llama3.3:70b # List locally available models ollama list

Using Ollama from Python

Ollama exposes an OpenAI-compatible REST API. You can use the native Python SDK or call it directly via requests:

# pip install ollama import ollama # Simple request response = ollama.chat( model='llama3.2', messages=[{ 'role': 'user', 'content': ( 'Analyze this customer feedback and identify key issues: ' '"The product quality is excellent but delivery took 10 days ' 'and the packaging arrived damaged."' ) }] ) print(response['message']['content']) # Key issues: Product quality (positive), Delivery time (negative — 10 days), # Packaging integrity (negative — arrived damaged) # Streaming for real-time UI responses for chunk in ollama.chat( model='llama3.2', messages=[{'role': 'user', 'content': 'Write a professional follow-up email'}], stream=True ): print(chunk['message']['content'], end='', flush=True)

Recommended models by use case

Use CaseRecommended ModelVRAM RequiredSpeed
Classification, extractionPhi-3 Mini (3.8B)4 GBVery fast
Chatbot, document Q&ALlama 3.2 (7B)8 GBFast
Analysis, long-form generationMistral Nemo (12B)16 GBMedium
Complex reasoning, codeLlama 3.3 (70B)40 GBSlow
Embeddings (RAG)nomic-embed-text1 GBVery fast

LangChain: orchestrate your AI pipelines

LangChain is the most widely adopted Python (and JavaScript) framework for building AI applications. It provides abstractions for connecting LLMs to databases, APIs, and external tools — and for orchestrating multi-step processing pipelines. Think of it as the plumbing that connects Ollama to your data and business logic.

RAG pipeline with Ollama — complete working example

The most common enterprise use case: a chatbot that answers questions about your internal documents (contracts, HR policies, product documentation, knowledge base).

# pip install langchain langchain-community langchain-ollama chromadb pypdf from langchain_ollama import OllamaLLM, OllamaEmbeddings from langchain_community.vectorstores import Chroma from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain.chains import RetrievalQA # 1. Load your documents (PDF, DOCX, TXT...) loader = PyPDFLoader("employee-handbook-2026.pdf") documents = loader.load() # 2. Split into chunks (improves retrieval precision) splitter = RecursiveCharacterTextSplitter( chunk_size=800, chunk_overlap=100, separators=[" ", " ", ".", " "] ) chunks = splitter.split_documents(documents) print(f"Split into {len(chunks)} chunks") # 3. Generate embeddings and store in ChromaDB (persistent) embeddings = OllamaEmbeddings(model="nomic-embed-text") vectorstore = Chroma.from_documents( chunks, embeddings, persist_directory="./chroma_db" # Survives restarts ) # 4. Build the RAG chain llm = OllamaLLM(model="llama3.2", temperature=0.1) qa_chain = RetrievalQA.from_chain_type( llm=llm, chain_type="stuff", retriever=vectorstore.as_retriever(search_kwargs={"k": 3}), return_source_documents=True ) # 5. Query result = qa_chain.invoke({ "query": "What is the remote work policy?" }) print(result["result"]) # → "According to the 2026 Employee Handbook, employees may work remotely # up to 3 days per week after manager approval..." # Check which sources were used for doc in result["source_documents"]: print(f"Source: page {doc.metadata['page']}")
Pro tip: Use temperature=0.1 for factual document Q&A. A low temperature anchors the model to retrieved content and reduces hallucinations by roughly 60-70% compared to temperature=0.7.

Tool-using agents with LangChain

Beyond RAG, LangChain lets you build agents that use tools — custom functions, APIs, calculators — to complete multi-step tasks autonomously:

from langchain_ollama import OllamaLLM from langchain.agents import create_react_agent, AgentExecutor from langchain.tools import tool from langchain import hub # Define custom business tools @tool def check_inventory(product_id: str) -> str: """Check the current inventory level for a product ID.""" # Replace with your actual database call inventory = {"SKU-001": 45, "SKU-002": 0, "SKU-003": 120} qty = inventory.get(product_id, -1) if qty == -1: return f"Product {product_id} not found" return f"SKU {product_id} has {qty} units in stock" @tool def estimate_delivery(origin: str, destination: str) -> str: """Estimate delivery days from origin warehouse to destination city.""" transit_days = {"New York": 2, "Los Angeles": 5, "Chicago": 3, "default": 7} days = transit_days.get(destination, transit_days["default"]) return f"Estimated delivery from {origin} to {destination}: {days} business days" # Wire up the agent llm = OllamaLLM(model="llama3.2") tools = [check_inventory, estimate_delivery] prompt = hub.pull("hwchase17/react") agent = create_react_agent(llm, tools, prompt) executor = AgentExecutor(agent=agent, tools=tools, verbose=True) # The agent decides which tools to call and in what order result = executor.invoke({ "input": "Is SKU-001 in stock? If yes, how long would delivery to Chicago take?" }) print(result["output"]) # → "SKU-001 is in stock with 45 units. Delivery to Chicago from our # main warehouse would take 3 business days."}

n8n: automate AI workflows without code

n8n is an open source workflow automation platform (alternative to Zapier or Make) that stands out for its native AI node support — direct integration with Ollama, LangChain, or any LLM API in visual workflows. It is the glue that connects your AI components to your existing business tools (CRM, ERP, email, Slack, databases).

Two-minute setup

# Option 1: Docker (recommended for production) docker run -d --name n8n -p 5678:5678 -v ~/.n8n:/home/node/.n8n -e N8N_BASIC_AUTH_ACTIVE=true -e N8N_BASIC_AUTH_USER=admin -e N8N_BASIC_AUTH_PASSWORD=your_secure_password n8nio/n8n # Access the UI at: http://localhost:5678 # Option 2: npm (local development) npm install -g n8n n8n start

Example workflow: automated email triage with AI

A common business workflow that processes incoming emails with Ollama, categorizes them, and routes them to the right team:

  • Trigger — Incoming email (Gmail, Outlook, or IMAP)
  • HTTP Request node — Calls local Ollama API for content analysis
  • Switch node — Routes by category (URGENT / SALES / SUPPORT / SPAM)
  • CRM node — Creates a task in HubSpot / Salesforce / Pipedrive
  • Slack node — Notifies the relevant team channel
# HTTP Request node configuration in n8n # URL: http://ollama:11434/api/generate (if Ollama runs in Docker on same network) # Method: POST # Body (JSON): { "model": "llama3.2", "prompt": "Categorize this email as exactly one of: URGENT, SALES, SUPPORT, SPAM. Email subject: {{ $json.subject }}. Email body: {{ $json.body }}. Reply with the category only.", "stream": false } # Access the response in the next node: # {{ $json.response }} # Tip: For JSON-structured output, add "format": "json" to the body # and ask the model to return: {"category": "URGENT", "summary": "..."}}

n8n can also trigger Python scripts, letting you invoke full LangChain pipelines from visual workflows:

# "Execute Command" node in n8n # Run your Python script with workflow data as arguments python3 /opt/scripts/process_document.py --file "{{ $json.attachment_path }}" --doc-id "{{ $json.message_id }}" --sender "{{ $json.from }}" # The script prints JSON to stdout — n8n captures it for the next nodes # Expected output: {"type": "CONTRACT", "amount": 4250, "status": "processed"}

Building a customer support chatbot: Ollama + LangChain + n8n

Here is a complete, working chatbot that combines all three tools: Ollama for local inference, LangChain for conversation memory and RAG over your product documentation, and n8n to wire it into your existing support channels (Slack, Intercom, email).

Step 1 — LangChain chatbot with memory (Python backend)

# pip install langchain langchain-ollama chromadb fastapi uvicorn pypdf from langchain_ollama import OllamaLLM, OllamaEmbeddings from langchain_community.vectorstores import Chroma from langchain.memory import ConversationBufferWindowMemory from langchain.chains import ConversationalRetrievalChain from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from fastapi import FastAPI from pydantic import BaseModel app = FastAPI() # --- Build the RAG knowledge base once at startup --- loader = PyPDFLoader("product-docs.pdf") docs = loader.load() chunks = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=80).split_documents(docs) embeddings = OllamaEmbeddings(model="nomic-embed-text") vectorstore = Chroma.from_documents(chunks, embeddings, persist_directory="./support_kb") llm = OllamaLLM(model="llama3.2", temperature=0.2) # Per-session memories stored in-process (use Redis for production) sessions: dict[str, ConversationBufferWindowMemory] = {} def get_chain(session_id: str) -> ConversationalRetrievalChain: if session_id not in sessions: sessions[session_id] = ConversationBufferWindowMemory( memory_key="chat_history", return_messages=True, k=6 # Keep last 6 turns (3 user + 3 assistant) ) return ConversationalRetrievalChain.from_llm( llm=llm, retriever=vectorstore.as_retriever(search_kwargs={"k": 3}), memory=sessions[session_id] ) class ChatRequest(BaseModel): session_id: str message: str @app.post("/chat") def chat(req: ChatRequest): chain = get_chain(req.session_id) result = chain.invoke({"question": req.message}) return { "answer": result["answer"], "session_id": req.session_id } # Start: uvicorn chatbot:app --host 0.0.0.0 --port 8000

Step 2 — n8n webhook connecting your chatbot to Slack

This n8n workflow receives a Slack slash command, forwards the message to the FastAPI chatbot, and posts the answer back in the same thread — all without any code in n8n itself:

# n8n workflow: Slack → Chatbot → Slack (4 nodes, zero code) # Node 1: Webhook (trigger) # Method: POST # URL: https://your-n8n.com/webhook/slack-support # Note: Register this URL as your Slack slash command URL # Node 2: HTTP Request (call the LangChain chatbot) # Method: POST # URL: http://chatbot-service:8000/chat # Body (JSON): { "session_id": "{{ $json.user_id }}", "message": "{{ $json.text }}" } # Node 3: Respond to Webhook (acknowledge Slack within 3s) # Response: { "response_type": "ephemeral", "text": "Thinking..." } # Node 4: HTTP Request (post final answer to Slack) # Method: POST # URL: {{ $json.response_url }} # Body: { "response_type": "in_channel", "text": "{{ $('HTTP Request').item.json.answer }}" } # Expected behavior: # User types: /support What is the refund policy? # Slack receives: "Our refund policy allows returns within 30 days of purchase..." # Response time: ~2-4 seconds (Llama 3.2 7B on GPU)
Scaling tip: The session memory above is in-process (lost on restart). For production, replace it with RedisChatMessageHistory from langchain_community.chat_message_histories — add a Redis container to your Docker Compose and pass the connection URL. This lets multiple chatbot instances share conversation state.

Combined architecture: a real-world example

Here is a real architecture deployed by a 50-person professional services firm to automate the processing of client documents (invoices, contracts, proposals):

# Python script orchestrating Ollama + ChromaDB # Triggered by n8n on each new document upload import ollama import chromadb from pathlib import Path from datetime import datetime import json # Persistent ChromaDB client chroma_client = chromadb.PersistentClient(path="/data/company_docs") collection = chroma_client.get_or_create_collection( name="documents", metadata={"hnsw:space": "cosine"} ) def extract_and_store_document(file_path: str, doc_id: str) -> dict: """ Extract structured data from a document and store it for future retrieval. For PDF files, pre-process with PyPDF2 or pdfplumber before calling this. """ text = Path(file_path).read_text(encoding="utf-8") # Structured extraction using Ollama extraction_prompt = f"""Extract from this document: 1. Document type (INVOICE/CONTRACT/PROPOSAL/OTHER) 2. Total amount in EUR (0 if not applicable) 3. Document date (YYYY-MM-DD format) 4. Client or vendor name 5. One-sentence summary Return valid JSON only, no explanation. Document: {text[:3000]}""" response = ollama.chat( model='llama3.2', messages=[{ 'role': 'user', 'content': extraction_prompt }], format='json' # Forces structured JSON output ) metadata = json.loads(response['message']['content']) metadata['processed_at'] = datetime.now().isoformat() metadata['file_path'] = file_path # Generate embeddings via Ollama (~100ms for typical documents) embedding_response = ollama.embeddings( model='nomic-embed-text', prompt=text[:2000] ) # Store in ChromaDB for semantic search collection.add( documents=[text[:5000]], embeddings=[embedding_response['embedding']], metadatas=[metadata], ids=[doc_id] ) return metadata if __name__ == "__main__": result = extract_and_store_document( file_path="/data/uploads/invoice_2026_04_001.txt", doc_id="INV-2026-04-001" ) # Print JSON for n8n to capture print(json.dumps(result, indent=2)) # { # "type": "INVOICE", # "amount": 4250.00, # "date": "2026-04-03", # "client": "Acme Corporation", # "summary": "Invoice for Q1 digital transformation consulting services", # "processed_at": "2026-04-06T09:14:32.451Z" # }
Production architecture tip: Add a queue (Redis Queue or AWS SQS) between n8n and your Python script to handle documents in parallel without overloading Ollama. n8n handles ingestion and routing; the queue smooths out traffic spikes. This allows you to process 10x more documents without increasing server specs.

Cost comparison: open source vs. proprietary APIs

Let's use a concrete scenario: a team processing 10,000 documents per month, each requiring approximately 2,000 tokens of input and 500 tokens of output.

SolutionMonthly costAnnual costData leaves your infra
OpenAI GPT-4oEUR 275EUR 3,300Yes
Anthropic Claude SonnetEUR 230EUR 2,760Yes
Ollama (EC2 g4dn.xlarge)EUR 135EUR 1,620No
Ollama (dedicated server)EUR 40-80EUR 480-960No

The cost savings are real, but the more significant advantage for many businesses is data sovereignty. Law firms, clinics, banks, and any company processing personal data under GDPR face legal constraints that make sending data to third-party APIs a compliance risk — not just a cost consideration.

The right choice depends on your context. Proprietary APIs win on ease of setup and maximum model capability. Open source wins on cost, privacy, and customizability for sustained, high-volume usage.

Getting started

Mastering these three tools together requires roughly 20-30 hours of hands-on practice for a technical profile. The steepest learning curve is LangChain — particularly the LCEL (LangChain Expression Language) abstractions and the shift to thinking in composable chains.

Our AI Agents training covers LangChain and LangGraph in depth, with practical exercises running on Ollama. For teams that want to adopt n8n without touching code, the No-Code AI Automation training is the right entry point.

For a deeper dive into LangChain and LangGraph, our practical guide to LangChain and LangGraph covers LCEL patterns, advanced RAG, and multi-step agent construction with complete working code examples.

Frequently asked questions

Is Ollama production-ready for business use?

Yes, with proper infrastructure sizing. Ollama runs reliably in production on GPU-equipped servers (NVIDIA A10 or better for 7B-13B models). Most businesses deploy it on AWS EC2 (g4dn or g5 instances), GCP, or Azure. For low to medium volumes (under 1,000 requests/day), a dedicated server is sufficient. Beyond that, combine Ollama with a load balancer and multiple instances for horizontal scaling.

Is LangChain still relevant in 2026 compared to LlamaIndex and Haystack?

LangChain remains the most widely adopted framework (50M+ downloads/month) with the richest integration ecosystem. LlamaIndex excels at pure RAG pipelines. Haystack is preferred for enterprise semantic search with Elasticsearch backends. For most business use cases — chatbots, RAG, multi-step agents — LangChain + LangGraph is the most pragmatic choice in 2026.

Can I run RAG with Ollama without a GPU?

Yes, but performance is reduced. CPU-only setups work with smaller models like Phi-3 Mini (3.8B) or Llama 3.2 3B — expect 3-8 seconds per response. For RAG specifically, the embedding component (nomic-embed-text) is lightweight and runs well on CPU. If GPU hardware is not feasible on-premise, consider GPU cloud spot instances for batch processing workflows.

Is n8n free for enterprise use?

n8n has three tiers: self-hosted Community Edition (completely free, open source), Cloud Starter (EUR 20/month, 2,500 executions), and Enterprise (custom pricing, SLA, SSO, audit logs). For most SMBs, the self-hosted version on a EUR 10-20/month VPS covers all needs. The source code is on GitHub under the Apache 2.0 license with a Sustainable Use exception.

Formez votre equipe a l'IA

Nos formations sont financables OPCO — reste a charge potentiel : 0€.

Voir les formationsVerifier eligibilite OPCO