“The API costs us €X a month, whereas a local GPU is free.” That sentence triggers more bad infrastructure decisions than any other. This guide is not pro-local advocacy: it is an honest comparison of total cost of ownership (TCO), with the hidden costs on both sides, and a decision criterion that is not “which is cheaper.”
The seductive — and misleading — math
The classic reasoning: look at the monthly API bill, compare it to the price of a GPU, and local wins “because after purchase it is free.” The problem: API and self-hosting do not have the same cost structure.
- API: fixed cost ≈ 0, marginal cost per request. You pay exactly what you consume.
- Self-hosting: high fixed cost (hardware + operations), marginal cost ≈ 0. You pay for capacity, whether it is used or not.
Comparing a marginal cost to a fixed cost without accounting for volume is like comparing rent to buying a home by looking only at the monthly payment.
The hidden costs of self-hosting
Hardware and amortization
The GPU is a capex to amortize, not a zero cost. A serious inference node runs into the thousands of euros, spread over 2 to 4 years — and the resale value of an AI GPU drops fast.
Electricity and cooling
An always-on node draws power 24/7, load or no load. Over a year, electricity (and cooling) becomes a real line item, especially where the kWh is expensive.
Maintenance and updates
Drivers, inference runtime, quantization formats, new models: the stack moves constantly. An update that breaks production is engineer time — and sometimes downtime.
Availability and latency
With self-hosting, there is no SLA: an outage is your problem, at 3 a.m. if needed. Latency and load handling are not guaranteed by a third party — getting them is on you.
Human time — the most underestimated cost
This is the line item that flips most calculations. The operational skill needed to run an LLM reliably in production is not free. A quantization format that leaks memory means several days of debugging before you find the right setting. If that skill is not already in-house, the “free” of local becomes very expensive.
The hidden costs of the API
The marginal cost that runs away
The API’s advantage (pay-as-you-go) becomes a flaw at high volume: at several million requests, cost per token ends up dominating every other consideration.
Confidentiality and sovereignty
Every request sends your data to a third party, often outside your jurisdiction. For sensitive or regulated data, this is not a price question but a control question.
Lock-in and price changes
Your cost depends on a pricing grid you do not control, and switching vendors has a cost. You also inherit the vendor’s rate limits and quotas.
The tipping point: volume
It all comes down to the crossover between a fixed cost (self-hosting) and a usage-growing cost (API):
Well-run migrations show dramatic drops on machine cost alone (see our migration case study and our LLM cost benchmark). But those numbers never include the engineer-hours — which are precisely what make the operation profitable… or not.
When the API wins
- Low or erratic volume: nothing to amortize, you just pay for usage.
- No ops team: you buy a third party’s reliability rather than building it.
- Need for the latest frontier models without managing hardware.
- Fast product iteration: test without provisioning infrastructure.
- Compliance you would rather delegate to the vendor.
When self-hosting wins
- High and stable volume: enough to amortize, steady enough not to pay for idle hardware.
- Confidentiality / sovereignty: the data must not leave.
- Control over latency and the end-to-end stack.
- You already have the ops skill in-house.
- Predictable load, no dependence on an external pricing grid.
Our take
In practice, we run hybrid: self-hosting absorbs the repetitive volume (drafts, classification, internal tasks) via a local router, and the API/Claude steps in for supervision and rare, high-quality tasks. It is not “local or cloud,” it is “the right tool per workload type.” See also our AI cost optimization guide and local LLM in production.
And we own it: local is only rational if you have the operational skill to keep it running. Without it, the API is not an admission of weakness — it is the rational choice.
Decision table
| Your situation | Rational choice |
|---|---|
| Low or unpredictable volume | API |
| High, stable volume + ops team | Self-hosting |
| Sensitive data / sovereignty required | Self-hosting (or private cloud) |
| Need for recent frontier models | API |
| No in-house ops skill | API |
| Mixed volume (repetitive + spikes) | Hybrid |
Conclusion
The right criterion is not “cheaper.” It is the combination of four factors: control (over the stack and the data), volume (high and stable enough to amortize), confidentiality (must the data stay with you?), and operational skill (can you actually run it?). If you have all four, self-hosting becomes rational. If one is missing, the API — or hybrid — is probably the better decision. Unit price is just one variable among several, and rarely the most important.