Architecture Overview
The system combines four tools, each doing what it does best: n8n handles event orchestration (no code needed for routing logic), Ollama runs local embeddings (private, fast, free), LangChain manages the retrieval pipeline, and Claude provides high-accuracy classification reasoning via API. None of these tools know about each other natively — FastAPI is the thin glue layer that makes them interoperate.
Customer email / support form
│
▼
┌─────────────────────┐
│ n8n Webhook │ POST /webhook/ticket
│ (trigger node) │ { text, customer_id, channel }
└─────────┬───────────┘
│ HTTP POST
▼
┌─────────────────────┐
│ FastAPI :8001 │ /classify
│ (glue layer) │
└──────┬──────┬───────┘
│ │
▼ ▼
┌──────────┐ ┌─────────────────────┐
│ Ollama │ │ Claude Haiku API │
│ :11434 │ │ (classification) │
│ nomic- │ │ category, priority │
│ embed │ │ sentiment, routing │
└──────────┘ └─────────────────────┘
│ │
▼ │
┌──────────┐ │
│ ChromaDB │ │
│ (similar │ │
│ tickets)│ │
└──────┬───┘ │
└──────┘
│ JSON result
▼
┌─────────────────────┐
│ n8n Switch node │ routing_target
└──┬──────┬──────┬────┘
▼ ▼ ▼
Slack Linear HubSpot
(ops) (tech) (sales)Prerequisites
You need these installed before starting:
- n8n self-hosted (Docker:
docker run -p 5678:5678 n8nio/n8n) or n8n Cloud - Python 3.11+ with
pip install langchain langchain-community langchain-anthropic chromadb fastapi uvicorn httpx - Ollama with nomic-embed-text pulled:
ollama pull nomic-embed-text - Anthropic API key (Claude Haiku — cheapest model, best for classification)
# Install all Python dependencies
pip install langchain==0.3.x langchain-community==0.3.x langchain-anthropic==0.3.x chromadb==0.6.x fastapi==0.115.x uvicorn==0.34.x httpx==0.28.x
# Pull the embedding model (147MB)
ollama pull nomic-embed-text
# Verify Ollama is running
curl http://localhost:11434/api/tags
# → {"models":[{"name":"nomic-embed-text:latest",...}]}Step 1 — n8n Webhook Trigger
The n8n webhook node listens for incoming support tickets. When a customer submits a form or sends an email (parsed by n8n's Email node), n8n normalizes it and fires the webhook. Create a new workflow in n8n with a Webhook trigger node configured as follows:
// n8n Webhook node settings (JSON view)
{
"node": "n8n-nodes-base.webhook",
"parameters": {
"path": "ticket",
"httpMethod": "POST",
"responseMode": "lastNode",
"options": {
"rawBody": false
}
}
}
// Expected incoming payload from your form/email parser:
{
"text": "My invoice #4521 shows a double charge for March. Please fix urgently.",
"customer_id": "cust_8f2a91",
"channel": "email",
"customer_tier": "premium"
}After the Webhook node, add an HTTP Request node that POSTs to your FastAPI service. Set the URL to http://localhost:8001/classify (or your VPS IP if n8n and FastAPI run on different hosts). Set the body to use the expression{{ $json }} to forward the full webhook payload.
http://host.docker.internal:8001when n8n runs in Docker.Step 2 — LangChain + Ollama Embeddings
The embedding pipeline does two things: (1) embeds the incoming ticket text using Ollama's nomic-embed-text model, then (2) searches ChromaDB for the 3 most similar historical tickets to provide context to Claude. This dramatically improves classification accuracy — Claude sees what category similar tickets were assigned to in the past.
# embedding_service.py
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
import chromadb
# Initialize Ollama embeddings (local, private, free)
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434",
)
# Initialize ChromaDB with persistence
chroma_client = chromadb.PersistentClient(path="./chroma_tickets")
# LangChain wrapper around ChromaDB
vectorstore = Chroma(
client=chroma_client,
collection_name="resolved_tickets",
embedding_function=embeddings,
)
def get_similar_tickets(ticket_text: str, k: int = 3) -> list[dict]:
"""Return k most similar resolved tickets with their categories."""
results = vectorstore.similarity_search_with_score(ticket_text, k=k)
return [
{
"text": doc.page_content,
"category": doc.metadata.get("category", "unknown"),
"priority": doc.metadata.get("priority", "medium"),
"score": round(float(score), 3),
}
for doc, score in results
]
def ingest_resolved_ticket(ticket_id: str, text: str, category: str, priority: str) -> None:
"""Add a resolved ticket to the vector store for future similarity lookup."""
doc = Document(
page_content=text,
metadata={"ticket_id": ticket_id, "category": category, "priority": priority},
)
vectorstore.add_documents([doc])
# Seed with example historical tickets (run once at startup)
SEED_TICKETS = [
("t001", "My subscription was charged twice this month", "billing", "high"),
("t002", "API rate limits are hitting us on the production endpoint", "technical", "critical"),
("t003", "I'd like to upgrade to the enterprise plan and discuss volume pricing", "sales", "medium"),
("t004", "Password reset email never arrived", "technical", "medium"),
("t005", "Can I get a refund for the unused portion of my annual plan?", "billing", "medium"),
("t006", "The CSV export feature throws an error on files over 10MB", "technical", "high"),
]
def seed_vectorstore():
existing = vectorstore.get(limit=1)
if not existing["ids"]:
for tid, text, cat, pri in SEED_TICKETS:
ingest_resolved_ticket(tid, text, cat, pri)
print(f"Seeded vectorstore with {len(SEED_TICKETS)} example tickets.")Step 3 — Claude API Classification
Claude Haiku receives the ticket text plus the 3 similar historical tickets as context, then returns a structured JSON classification. Using structured output (via LangChain'swith_structured_output) ensures the response is always parseable — no regex, no prompt gymnastics.
# classification_service.py
from langchain_anthropic import ChatAnthropic
from langchain.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import Literal
import os
class TicketClassification(BaseModel):
category: Literal["billing", "technical", "sales", "general"] = Field(
description="Primary ticket category"
)
priority: Literal["low", "medium", "high", "critical"] = Field(
description="Priority based on urgency and customer tier"
)
sentiment: Literal["positive", "neutral", "frustrated", "angry"] = Field(
description="Customer sentiment"
)
routing_target: Literal["billing_team", "tech_team", "sales_team", "support_team"] = Field(
description="Which team should handle this ticket"
)
summary: str = Field(description="One-sentence summary of the issue, max 20 words")
confidence: float = Field(description="Classification confidence between 0.0 and 1.0")
# Claude Haiku — fastest and cheapest Claude model, ideal for classification
llm = ChatAnthropic(
model="claude-haiku-4-5",
api_key=os.environ["ANTHROPIC_API_KEY"],
max_tokens=256,
temperature=0, # deterministic output for classification
)
structured_llm = llm.with_structured_output(TicketClassification)
CLASSIFICATION_PROMPT = ChatPromptTemplate.from_messages([
("system", """You are a support ticket classifier. Classify the ticket into exactly one category
and routing target. Use the similar resolved tickets as reference for category and priority patterns.
Similar resolved tickets (for context):
{similar_tickets}
Rules:
- billing: payment issues, invoices, refunds, subscription charges
- technical: bugs, API errors, performance issues, feature failures
- sales: upgrades, pricing questions, demos, enterprise inquiries
- general: everything else
Priority escalation triggers: double-charges → high, API down → critical, enterprise customer → +1 level"""),
("human", "Classify this ticket:
{ticket_text}"),
])
classification_chain = CLASSIFICATION_PROMPT | structured_llm
def classify_ticket(ticket_text: str, similar_tickets: list[dict]) -> TicketClassification:
similar_str = "
".join(
f"- [{t['category']}/{t['priority']}] {t['text']} (similarity: {t['score']})"
for t in similar_tickets
)
return classification_chain.invoke({
"ticket_text": ticket_text,
"similar_tickets": similar_str or "No similar tickets found.",
})temperature=0? Classification is a deterministic task. Temperature 0 gives consistent results and reduces token usage (Claude stops generating earlier when it's certain). Never use temperature > 0.3 for routing decisions that affect real customers.Step 4 — FastAPI Integration Layer
FastAPI ties the two services together into a single HTTP endpoint that n8n calls. It also exposes a /feedback endpoint for ingesting resolved tickets back into ChromaDB, closing the self-improvement loop.
# main.py — FastAPI integration layer
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from contextlib import asynccontextmanager
import logging
from embedding_service import get_similar_tickets, ingest_resolved_ticket, seed_vectorstore
from classification_service import classify_ticket, TicketClassification
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@asynccontextmanager
async def lifespan(app: FastAPI):
logger.info("Seeding vectorstore...")
seed_vectorstore()
logger.info("Ready.")
yield
app = FastAPI(title="Support Ticket Classifier", version="1.0.0", lifespan=lifespan)
class TicketRequest(BaseModel):
text: str
customer_id: str
channel: str = "email"
customer_tier: str = "standard"
class TicketResponse(BaseModel):
customer_id: str
classification: TicketClassification
similar_ticket_count: int
class FeedbackRequest(BaseModel):
ticket_id: str
text: str
category: str
priority: str
@app.post("/classify", response_model=TicketResponse)
async def classify(req: TicketRequest) -> TicketResponse:
if not req.text.strip():
raise HTTPException(status_code=422, detail="Ticket text cannot be empty")
# 1. Get similar tickets from ChromaDB via Ollama embeddings
similar = get_similar_tickets(req.text, k=3)
logger.info(f"Found {len(similar)} similar tickets for customer {req.customer_id}")
# 2. Classify with Claude Haiku + context
classification = classify_ticket(req.text, similar)
# 3. Escalate priority for premium customers
if req.customer_tier == "premium" and classification.priority == "medium":
classification.priority = "high"
return TicketResponse(
customer_id=req.customer_id,
classification=classification,
similar_ticket_count=len(similar),
)
@app.post("/feedback", status_code=204)
async def feedback(req: FeedbackRequest) -> None:
"""Ingest a resolved ticket into ChromaDB to improve future similarity search."""
ingest_resolved_ticket(req.ticket_id, req.text, req.category, req.priority)
logger.info(f"Ingested resolved ticket {req.ticket_id} → {req.category}/{req.priority}")
@app.get("/health")
async def health() -> dict:
return {"status": "ok"}
# Run with: uvicorn main:app --host 0.0.0.0 --port 8001 --workers 2# Test the endpoint locally
curl -X POST http://localhost:8001/classify -H "Content-Type: application/json" -d '{
"text": "My invoice #4521 shows a double charge for March. Please fix urgently.",
"customer_id": "cust_8f2a91",
"channel": "email",
"customer_tier": "premium"
}'
# Expected response:
{
"customer_id": "cust_8f2a91",
"classification": {
"category": "billing",
"priority": "high",
"sentiment": "frustrated",
"routing_target": "billing_team",
"summary": "Customer reports duplicate charge on March invoice #4521",
"confidence": 0.97
},
"similar_ticket_count": 3
}Step 5 — n8n Routing Logic
Back in n8n, the HTTP Request node receives the classification JSON. A Switchnode reads routing_target and branches to the appropriate downstream integration. Below is the n8n workflow JSON for the routing section — import it directly via n8n's "Import from JSON" option.
// n8n workflow routing section (import-ready JSON)
{
"nodes": [
{
"name": "Classify Ticket",
"type": "n8n-nodes-base.httpRequest",
"parameters": {
"method": "POST",
"url": "http://host.docker.internal:8001/classify",
"sendBody": true,
"bodyParameters": {
"parameters": [
{ "name": "text", "value": "={{ $json.text }}" },
{ "name": "customer_id", "value": "={{ $json.customer_id }}" },
{ "name": "channel", "value": "={{ $json.channel }}" },
{ "name": "customer_tier", "value": "={{ $json.customer_tier ?? 'standard' }}" }
]
},
"options": { "timeout": 10000 }
}
},
{
"name": "Route by Team",
"type": "n8n-nodes-base.switch",
"parameters": {
"dataType": "string",
"value1": "={{ $json.classification.routing_target }}",
"rules": {
"rules": [
{ "value2": "billing_team", "output": 0 },
{ "value2": "tech_team", "output": 1 },
{ "value2": "sales_team", "output": 2 }
]
},
"fallbackOutput": 3
}
},
{
"name": "Notify Billing (Slack)",
"type": "n8n-nodes-base.slack",
"parameters": {
"channel": "#billing-support",
"text": "=🔴 *{{ $json.classification.priority.toUpperCase() }}* | {{ $json.classification.summary }}
Customer: {{ $json.customer_id }} | Confidence: {{ $json.classification.confidence }}"
}
},
{
"name": "Create Linear Issue (Tech)",
"type": "n8n-nodes-base.linear",
"parameters": {
"team": "Engineering",
"title": "=[Support] {{ $json.classification.summary }}",
"priority": "={{ $json.classification.priority === 'critical' ? 1 : $json.classification.priority === 'high' ? 2 : 3 }}",
"description": "={{ $json.customer_id }}: {{ $json.text }}"
}
},
{
"name": "HubSpot Deal (Sales)",
"type": "n8n-nodes-base.hubspot",
"parameters": {
"resource": "deal",
"operation": "create",
"additionalFields": {
"dealname": "=Upgrade inquiry: {{ $json.customer_id }}",
"pipeline": "sales_pipeline",
"dealstage": "appointmentscheduled"
}
}
}
]
}POST /feedbackwith the confirmed category. This continuously improves ChromaDB similarity accuracy over time — no retraining required.Production Considerations
Graceful degradation
Wrap both Ollama and Claude calls with timeouts and fallbacks. If Ollama is unavailable, skip the similarity search and classify without context (accuracy drops ~6%). If Claude returns a 5xx, retry once then fall back to a keyword-based classifier. Never let a single service failure drop a customer ticket.
# Graceful fallback in classification_service.py
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(2), wait=wait_exponential(multiplier=0.5, max=2))
def classify_ticket_with_retry(ticket_text: str, similar_tickets: list[dict]) -> TicketClassification:
return classify_ticket(ticket_text, similar_tickets)
def classify_with_fallback(ticket_text: str, similar: list[dict]) -> TicketClassification:
try:
return classify_ticket_with_retry(ticket_text, similar)
except Exception:
# Rule-based fallback — zero API cost, ~80% accuracy
text_lower = ticket_text.lower()
if any(w in text_lower for w in ["invoice", "charge", "refund", "payment"]):
cat, team = "billing", "billing_team"
elif any(w in text_lower for w in ["error", "api", "bug", "crash", "slow"]):
cat, team = "technical", "tech_team"
elif any(w in text_lower for w in ["upgrade", "pricing", "enterprise", "demo"]):
cat, team = "sales", "sales_team"
else:
cat, team = "general", "support_team"
return TicketClassification(
category=cat, priority="medium", sentiment="neutral",
routing_target=team, summary="[Fallback classifier]", confidence=0.6
)Docker Compose deployment
# docker-compose.yml
services:
fastapi:
build: .
ports: ["8001:8001"]
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
volumes:
- ./chroma_tickets:/app/chroma_tickets
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 30s
timeout: 5s
retries: 3
ollama:
image: ollama/ollama:latest
ports: ["11434:11434"]
volumes:
- ollama_models:/root/.ollama
restart: unless-stopped
volumes:
ollama_models:Cost Analysis
This architecture is designed for frugality. Here's the real monthly cost for three ticket volume scenarios:
| Component | 1K tickets/mo | 10K tickets/mo | 100K tickets/mo |
|---|---|---|---|
| VPS (Hetzner CAX21 ARM) | EUR 5.77 | EUR 5.77 | EUR 11.54 (×2) |
| Claude Haiku API (~600 tokens/ticket) | EUR 0.23 | EUR 2.30 | EUR 23.00 |
| Ollama embeddings (local) | EUR 0 | EUR 0 | EUR 0 |
| ChromaDB (in-process) | EUR 0 | EUR 0 | EUR 0 |
| n8n self-hosted (same VPS) | EUR 0 | EUR 0 | EUR 0 |
| Total | EUR 6.00 | EUR 8.07 | EUR 34.54 |
Compare to Zendesk AI Add-on: $50/agent/month × 5 agents = $250/month, regardless of ticket volume. At 100K tickets/month, this open-source stack costs 87% less while giving you full control over routing logic, data residency, and model selection.
FAQ
Can I replace Claude with a local Ollama model for the classification step?
Yes. Swap the Anthropic client for LangChain's OllamaLLM with a model like qwen3:8b or mistral-small3.2. The trade-off: a well-prompted Claude Haiku classifies support tickets with ~94% accuracy and costs $0.00025 per ticket. Qwen3-8B self-hosted achieves ~88% on the same benchmark at near-zero marginal cost. For most businesses at under 50,000 tickets/month, Claude Haiku wins on accuracy-per-dollar. Above 200,000 tickets/month, a fine-tuned Qwen3-8B running on a single A10G ($0.60/hr) becomes cheaper. The FastAPI architecture in this article supports swapping the LLM with one line change.
How does n8n send data to a Python LangChain service?
n8n's HTTP Request node sends a POST to your FastAPI endpoint with a JSON body containing the ticket text, customer ID, and any metadata. FastAPI receives the request, runs the LangChain pipeline (Ollama embeddings → ChromaDB similarity search → Claude classification), and returns a JSON response with category, priority, sentiment, and routing_target. The n8n Switch node then reads routing_target and branches to Slack, Linear, or HubSpot accordingly. Total round-trip latency in production: 380-620ms for ticket classification.
What happens when Ollama or Claude is unavailable?
Add a fallback in your FastAPI handler: (1) wrap the Ollama embeddings call in a try/except — if it fails, use a keyword-based category lookup as a degraded fallback. (2) Wrap the Claude call similarly — if the Anthropic API returns a 529 or 5xx, retry once with exponential backoff (0.5s, 1.0s), then fall back to a rule-based classifier. n8n's Error Trigger node can catch HTTP 5xx responses from your FastAPI service and route tickets to a human review queue instead of dropping them. Build for graceful degradation from day one.
How do I keep the ChromaDB vector index up to date as new tickets come in?
After a ticket is resolved, add a second n8n workflow: the Linear/Zendesk webhook fires when a ticket closes, n8n POSTs the resolved ticket + resolution category to a /feedback endpoint in your FastAPI service, and LangChain ingests it into ChromaDB. This creates a self-improving system — similarity search accuracy improves as your historical ticket volume grows. Run a weekly ChromaDB compaction job to remove duplicate embeddings and keep search latency under 50ms.
Is this architecture GDPR-compliant for customer data?
With Claude API: customer text is sent to Anthropic's servers (EU endpoint available at api.anthropic.com via EU-West CloudFront edge). Enable the zero data retention option in your API settings — Anthropic does not train on API data by default. With Ollama: embeddings and classification run entirely on your infrastructure, no data leaves your network. ChromaDB stores only embeddings (mathematical vectors), not raw customer text. For GDPR compliance, store the mapping between ticket ID and embedding ID in your database so you can delete both when a customer requests erasure. Run DPIAs for both options before going live.
What is the realistic monthly cost for 10,000 support tickets/month?
Breakdown for 10,000 tickets/month: (1) Claude Haiku classification: ~500 tokens per ticket (input) + 100 tokens output = $0.00025 × 10,000 = $2.50/month. (2) Ollama on a VPS (Hetzner CAX21, 4 vCPU ARM, 8 GB RAM, EUR 5.77/month): handles nomic-embed-text embeddings at 200 req/s, well within our 10K/month volume. (3) ChromaDB: runs in-process, zero additional cost. (4) n8n self-hosted on same VPS: included. (5) FastAPI: included. Total: ~EUR 8.50/month for 10,000 tickets fully classified and routed. Compare to Zendesk AI Add-on at $50/agent/month.