Open-Source AI Workflows: Whisper + n8n, LangChain FAQ Bot, PDF RAG (2026)

Three practical AI workflows you can deploy this week — no API subscriptions, no cloud vendor lock-in, no data leaving your machine. Each workflow uses a different open-source stack to cover the three most common AI requests from non-technical teams: summarizing meetings, automating FAQ responses, and querying internal documents.

These tutorials use Whisper (OpenAI’s open-source speech model) for transcription, LangChain + Ollama for the FAQ bot (Python, copy-paste ready), and ChromaDB for PDF search. They complement — rather than repeat — the n8n JSON and BART-based approaches covered in our earlier guide.

Prerequisites (all free): Python 3.11+, Docker Desktop, Ollama installed locally (brew install ollama on Mac or the Linux installer). 8 GB RAM recommended; 4 GB minimum.

Workflow 1 — Meeting Transcription with Whisper + n8n

Use case: Your team records stand-up calls, client meetings, or training sessions. You want automatic transcripts and a one-paragraph summary emailed to participants — without uploading audio to Otter.ai or Zoom AI (which send data to US servers).

Stack: Whisper (local HTTP server via whisper.cpp or the Python package) + n8n for orchestration + Ollama for the summary step.

Time to deploy: 25 minutes.

Step 1 Start a local Whisper HTTP server

The fastest approach is the official whisper.cpp server, which exposes a REST endpoint compatible with the OpenAI Audio API format:

# Install whisper.cpp (Mac/Linux)
brew install whisper-cpp          # Mac with Homebrew
# OR on Linux:
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make

# Download the 'base' model (~142 MB, fast on CPU)
./models/download-ggml-model.sh base

# Start the HTTP server on port 8080
./whisper-server -m models/ggml-base.bin --host 0.0.0.0 --port 8080

# Test it (upload a .wav or .mp3 file)
curl http://localhost:8080/inference \
  -F file="@meeting.mp3" \
  -F temperature="0" \
  -F response_format="json"

Expected output: a JSON object with a text field containing the full transcript. A 30-minute meeting in clear audio takes about 90 seconds on CPU (base model).

Step 2 Build the n8n orchestration workflow

In n8n, create a workflow with four nodes. Import the JSON below via “Import from JSON” in the n8n canvas menu:

{
  "name": "Meeting Transcriber + Summarizer",
  "nodes": [
    {
      "name": "Watch Folder",
      "type": "n8n-nodes-base.localFileTrigger",
      "parameters": {
        "path": "/home/user/meeting-uploads",
        "events": ["add"],
        "filter": "*.mp3,*.wav,*.m4a"
      }
    },
    {
      "name": "Transcribe with Whisper",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "http://localhost:8080/inference",
        "sendBody": true,
        "contentType": "multipart-form-data",
        "bodyParameters": {
          "parameters": [
            { "name": "file", "value": "={{ $binary.data }}", "parameterType": "formBinaryData" },
            { "name": "response_format", "value": "json" },
            { "name": "temperature", "value": "0" }
          ]
        }
      }
    },
    {
      "name": "Summarize with Ollama",
      "type": "n8n-nodes-base.httpRequest",
      "parameters": {
        "method": "POST",
        "url": "http://localhost:11434/api/generate",
        "sendBody": true,
        "specifyBody": "json",
        "jsonBody": "={\"model\": \"mistral:7b\", \"prompt\": \"Summarize this meeting transcript in 3 bullet points, then list action items with owner names:\\n\\n{{ $json.text }}\", \"stream\": false}"
      }
    },
    {
      "name": "Send Summary Email",
      "type": "n8n-nodes-base.emailSend",
      "parameters": {
        "toEmail": "team@yourcompany.com",
        "subject": "=Meeting Summary — {{ $now.format('YYYY-MM-DD') }}",
        "text": "={{ $json.response }}"
      }
    }
  ]
}

Tip: Replace localFileTrigger with an emailReadImap node to process meeting recordings attached to emails automatically. Set the IMAP node to watch for emails with subject “meeting recording” and attachment type audio/*.

Step 3 Test with a real recording

Drop any .mp3 or .wav file into the watched folder. Within 2 minutes (for a 30-minute recording), you receive an email with a 3-bullet summary and a numbered action-item list. Accuracy on clear audio: 96–98%. For accented speech (West African French, North African Arabic), switch the model to ggml-medium.bin — download time ~1.5 GB, accuracy gain ~12%.

Workflow 2 — FAQ Bot with LangChain + Ollama (50-line Python script)

Use case: Your support team answers the same 40 questions every week. You want a chatbot that reads your policy documents and answers accurately — without hallucinating unsupported answers.

Stack: Ollama (llama3.1:8b) + LangChain LCEL + ChromaDB (in-memory for small FAQ sets). This approach differs from the n8n JSON configuration by giving you full control over the prompt template and the “I don’t know” fallback behavior.

Step 1 Install dependencies and pull the model

# Create a virtual environment
python3 -m venv faq-env && source faq-env/bin/activate

# Install LangChain + Ollama integration
pip install langchain-ollama langchain-chroma langchain-community

# Pull the model (4.7 GB download, one-time)
ollama pull llama3.1:8b

# Verify Ollama is running
curl http://localhost:11434/api/tags

Step 2 Prepare your FAQ document

Create a plain text file faq.txt with your Q&A pairs. Format does not matter — LangChain will chunk and embed it:

Q: What is your return policy?
A: We accept returns within 30 days of purchase. Items must be unused and in original packaging. Contact support@company.com with your order number to initiate a return.

Q: Do you ship to West Africa?
A: Yes. We ship to Senegal, Côte d'Ivoire, Ghana, Nigeria, and Cameroon. Delivery takes 7–14 business days. Customs fees are the buyer's responsibility.

Q: What payment methods do you accept?
A: We accept Visa, Mastercard, Orange Money, Wave, and bank transfer (SEPA for EU customers). PayPal is not available in our region.

Step 3 The complete FAQ bot script

# faq_bot.py — copy and run: python faq_bot.py

from langchain_ollama import OllamaLLM, OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

# 1. Load and chunk the FAQ document
loader = TextLoader("faq.txt", encoding="utf-8")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

# 2. Create in-memory vector store with Ollama embeddings
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(chunks, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 3. Define the prompt — critical: include the fallback instruction
PROMPT = ChatPromptTemplate.from_template("""
You are a helpful support assistant. Answer the customer's question
using ONLY the information below. If the answer is not in the context,
reply exactly: "I don't have information on that. Please contact support@company.com."

Context:
{context}

Question: {question}
Answer:""")

# 4. Build the LCEL chain
llm = OllamaLLM(model="llama3.1:8b", temperature=0)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | PROMPT
    | llm
    | StrOutputParser()
)

# 5. Run an interactive loop
print("FAQ Bot ready. Type 'quit' to exit.\n")
while True:
    question = input("Customer: ").strip()
    if question.lower() == "quit":
        break
    response = chain.invoke(question)
    print(f"Bot: {response}\n")

What nomic-embed-text does: It converts your FAQ chunks into numerical vectors so the retriever can find the 3 most relevant chunks for each customer question. Pull it once: ollama pull nomic-embed-text (274 MB).

Common mistake: Setting temperature=0 is essential for FAQ bots — it eliminates the model’s tendency to elaborate beyond the source document. Higher temperatures cause creative answers that sound plausible but contradict your policies.

Expected output

Customer: Do you ship to Abidjan?
Bot: Yes. We ship to Côte d'Ivoire. Delivery takes 7–14 business days.
     Customs fees are the buyer's responsibility.

Customer: Can I pay with cryptocurrency?
Bot: I don't have information on that. Please contact support@company.com.

Workflow 3 — PDF Knowledge Assistant with RAG

Use case: Your team has 30+ PDFs — contracts, procedures, technical specs, HR policies. You want to ask questions in plain language and get answers with page references. No re-reading entire documents.

Stack: PyMuPDF (PDF text extraction) + LangChain + ChromaDB (persistent, survives restarts) + Ollama.

Step 1 Install and index your PDFs

pip install pymupdf langchain-ollama langchain-chroma langchain-community

# index_pdfs.py — run ONCE to build the knowledge base
import os
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

PDF_FOLDER = "./documents"   # Put your PDFs here
DB_PATH    = "./chroma_db"   # Persisted vector store

# Load all PDFs in the folder
docs = []
for filename in os.listdir(PDF_FOLDER):
    if filename.endswith(".pdf"):
        loader = PyMuPDFLoader(os.path.join(PDF_FOLDER, filename))
        docs.extend(loader.load())

print(f"Loaded {len(docs)} pages from {PDF_FOLDER}")

# Chunk with overlap to preserve context across page breaks
splitter = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=80)
chunks = splitter.split_documents(docs)
print(f"Split into {len(chunks)} chunks")

# Embed and persist (takes 2–5 min for 30 PDFs on CPU)
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(
    chunks,
    embeddings,
    persist_directory=DB_PATH,
    collection_name="company_docs"
)
print(f"Indexed {len(chunks)} chunks into {DB_PATH}")

Step 2 Query the knowledge base

# query_docs.py — run this to ask questions
from langchain_ollama import OllamaLLM, OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

DB_PATH = "./chroma_db"

# Load the persisted index (no re-indexing needed)
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma(
    persist_directory=DB_PATH,
    embedding_function=embeddings,
    collection_name="company_docs"
)

# Return 4 chunks + their source file and page number
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

def format_docs(docs):
    parts = []
    for d in docs:
        source = d.metadata.get("source", "unknown")
        page   = d.metadata.get("page", "?")
        parts.append(f"[Source: {source}, Page {page}]\n{d.page_content}")
    return "\n\n".join(parts)

PROMPT = ChatPromptTemplate.from_template("""
You are a precise document assistant. Use ONLY the excerpts below to answer.
Always cite the source file and page number in your answer.
If the answer is not in the excerpts, say "Not found in the indexed documents."

Excerpts:
{context}

Question: {question}
Answer (with source citations):""")

llm = OllamaLLM(model="llama3.1:8b", temperature=0)

chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | PROMPT
    | llm
    | StrOutputParser()
)

print("Knowledge Assistant ready. Type 'quit' to exit.\n")
while True:
    question = input("You: ").strip()
    if question.lower() == "quit":
        break
    print(f"\nAssistant: {chain.invoke(question)}\n")

Sample output

You: What is the notice period for contract termination?
Assistant: The notice period for contract termination is 30 calendar days for
           staff at Grade A–C and 60 days for Grade D and above.
           [Source: hr-policy-2026.pdf, Page 14]

           If either party fails to give notice, a payment in lieu of notice
           equal to the base salary for the notice period is required.
           [Source: hr-policy-2026.pdf, Page 15]

You: What are the IT security requirements for remote work?
Assistant: Employees working remotely must use the company VPN at all times
           when accessing internal systems. Two-factor authentication is mandatory.
           Personal devices require MDM enrollment before accessing company email.
           [Source: it-security-policy.pdf, Page 3]

Troubleshooting: 5 Common Errors and Fixes

1. Ollama returns "connection refused"

Ollama is not running. Start it with ollama serve in a terminal (Mac/Linux) or via the Ollama system tray app on Windows. On Linux servers, enable the systemd service: sudo systemctl enable --now ollama.

2. Whisper server: "model file not found"

You started the server pointing to the wrong model path. The model must be in the whisper.cpp/models/ directory. Check with: ls whisper.cpp/models/*.bin. If empty, re-run the download script: ./models/download-ggml-model.sh base.

3. ChromaDB "Embedding dimension mismatch"

You switched embedding models after creating the index. The stored vectors don’t match the new model’s dimensions. Fix: delete the ./chroma_db folder and re-run index_pdfs.py with the new model.

4. LangChain returns hallucinated answers not in your documents

The prompt template is missing the constraint instruction. Make sure your prompt explicitly says “Use ONLY the information below” and includes the fallback instruction. Also verify temperature=0 is set on the LLM — even small temperature values (0.3) cause factual drift.

5. n8n workflow stops after the Whisper node

The audio file is being sent as a URL instead of binary data. In the HTTP Request node, set “Send Binary Data” to true and map the binary property name to the output of the file trigger node (default: data). Check the n8n execution log for the exact error — it usually shows the HTTP 400 response from the Whisper server.

FAQ

Does Whisper require a GPU to transcribe meetings locally?

No. Whisper's 'base' and 'small' models run on CPU. A 30-minute meeting transcribes in about 4 minutes on a 2020-era laptop CPU. Use 'medium' or 'large-v3' only if you need high accuracy for heavily accented speech — those require 6–8 GB RAM and benefit from a GPU.

Which Ollama model works best for a FAQ bot in French or Arabic?

For French, use 'mistral:7b' or 'llama3.1:8b' — both score above 85% on French comprehension benchmarks. For Arabic, 'aya:8b' (Cohere's multilingual model) is the strongest open-source option available via Ollama. Run 'ollama pull aya:8b' to download it.

How do I handle scanned PDFs (images) in the RAG pipeline?

PyMuPDF only extracts text from digital PDFs. For scanned documents, add an OCR step: install 'pytesseract' and 'pdf2image', convert each page to an image, then run Tesseract OCR before passing text to LangChain. Alternatively, use 'marker-pdf' (open-source, GPU-optional) which handles both digital and scanned PDFs in one command.

Can I run the n8n meeting summarizer workflow on a shared team server?

Yes. Deploy n8n via Docker on any server with 2 GB RAM. All three components (n8n, Whisper HTTP server, Ollama) run in separate containers on the same machine. Use n8n's built-in credentials store to keep API keys secure. A EUR 20/month VPS handles 5–10 meeting summaries per day without performance issues.

What is the difference between this article's approach and the FAQ bot using n8n JSON configuration?

The n8n JSON approach is purely visual — no code required, configured entirely in the browser. This article's LangChain Python approach gives you more control: custom prompt templates, conversation memory, re-ranking retrieved chunks, and the ability to add filters (e.g., only answer from documents tagged for a specific department). Use n8n for simplicity; use LangChain for customization.

How many PDFs can the ChromaDB knowledge assistant handle before performance degrades?

ChromaDB with 'all-MiniLM-L6-v2' embeddings handles 50,000 document chunks comfortably on 8 GB RAM. A typical 200-page PDF generates roughly 400 chunks (500 characters each with 50-character overlap). So you can index approximately 125 PDFs before needing to upgrade to a server setup. For larger corpora, switch to Qdrant with its filtering capabilities.

Open-Source AI: 3 Deployable Workflows for Non-Technical Teams (Whisper, LangChain, RAG)