🦙
Ollama: Local LLMs in Production
Intensive technical training for developers and ops teams who want to deploy open-source LLMs in production without depending on proprietary APIs. Master Ollama, quantization, multi-GPU Docker deployment, and integration with your existing stack. Real case: startup reduced costs from EUR 4200/month to EUR 109/month (-97%).
Duration
2 days
Level
Intermediate
Price
9.99 EUR/month (all courses included)
Max group
12 participants
What you will learn
+Install and configure Ollama on different platforms (macOS, Linux, Docker)
+Choose the right model for your constraints (latency, quality, VRAM)
+Understand quantization (Q2, Q4, Q8) and optimize performance/quality
+Deploy with Docker Compose, multi-GPU load balancing, and Open WebUI
+Integrate with OpenAI-compatible API (2-line code migration)
+Implement monitoring (Prometheus, Grafana), rate limiting, and backups
+Calculate ROI and compare API costs vs self-hosted
Course program
Module 1: Ollama Fundamentals and Model Selection
3h30- Installing Ollama: first steps
- Understanding quantization: Q2, Q4, Q8, FP16
- Model selection: Llama, Mistral, CodeLlama, DeepSeek
- Performance benchmarks: latency, throughput, quality
- Use cases: which model for which task?
Module 2: Docker Deployment and Production Setup
3h30- Docker Compose: Ollama + Open WebUI
- Multi-GPU load balancing with NGINX
- Model caching and latency optimization
- Workshop: complete production architecture
Module 3: API Integration and OpenAI Compatibility
3h30- OpenAI-compatible API: 2-line migration
- Streaming: token-by-token responses
- LangChain integration: RAG and agents
- Workshop: migrate an OpenAI app to Ollama
Module 4: Production Patterns and Monitoring
3h30- Monitoring with Prometheus and Grafana
- Rate limiting with Redis and Celery
- Automated backup and disaster recovery
- Real case: startup reducing costs by 94%
- ROI calculation: API vs self-hosted
Ready to get started?
9.99 EUR/month — All courses included, cancel anytime