Moving your LLM calls to a managed service like AWS Bedrock — often for data residency — looks trivial: “same kind of API, just change the endpoint.” In practice, three differences cost us production incidents. They are less bugs than implicit contracts you only discover under real load. Here they are, to spare you.
Context (deliberately generic): a service that called an LLM via a direct API, migrated to a managed model on Bedrock, in a primary region (call it region A) with an inference profile that can route to secondary regions (regions B).
Gotcha 1 — Model availability is per-region
A model identifier that works in one region can return “not_found” in another. Bedrock does not expose the same catalog everywhere. The service worked in development (region A) and failed in production (another region) — with a “model not found” error.
The real trap is the type of error: “not found,” not “access denied.” You then go debug IAM for an hour, when the problem is the model choice for the region. Lesson: verify model availability in your deployment region before writing a single line, and pick a model actually served where you run.
Gotcha 2 — Cross-region inference needs multi-region permissions
For capacity and resilience, Bedrock offers inference profiles that route a request across multiple regions. Great for availability — a trap for IAM: your policy must allow invoking the model in each of those regions, not just the primary.
If you only grant the home region, you get intermittent access-denied: it appears only when the profile routes to an unauthorized region. So it is impossible to reproduce reliably and a nightmare to diagnose (“it works 9 times out of 10”). Lesson: grant invocation across all regions of the inference profile from the start, even those you think you will never use.
Gotcha 3 — The Converse API enforces strict message ordering
Direct “messages” APIs are often lenient about order. Converse is not: messages must alternate user / assistant and start with the user. Two consecutive same-role turns, or a history that does not start on the user side → server error in production.
The sneaky part: it is invisible in unit tests if your fixtures are already clean, and it explodes as soon as a real conversation history arrives (resumes, misplaced system messages, client-merged turns). Lesson: normalize order before the call — merge consecutive same-role turns and guarantee a first user message.
Recap
| Implicit contract | Production symptom | Countermeasure |
|---|---|---|
| Per-region catalog | “not_found” (false IAM lead) | Verify model availability in the target region |
| Multi-region IAM | Intermittent access-denied | Grant invocation across all profile regions |
| Strict ordering (Converse) | Server error on real history | Normalize message order before the call |
Takeaway
Migrating to a managed LLM is not changing a URL. It is accepting three different contracts: a per-region catalog, multi-region IAM, and a strict conversation format. All three surface in production if you do not anticipate them — so test them on a bench that replays a real history, in your real region.
Going further on model and serving trade-offs: self-hosted vs API TCO, LLM API comparison and which serving format in production.