Is migrating LLM calls to Bedrock just changing the endpoint?

No — that assumption is what causes incidents. The API looks like a standard LLM API, but three implicit contracts differ: the model catalog is per-region, cross-region inference requires multi-region IAM permissions, and the Converse API enforces strict message ordering. None of the three is visible until you have real production traffic.

Why does a model return “not_found” when it exists?

Because model availability is per-region. An identifier that is valid in one region can be missing in another. The trap: the error is “model not found,” not “access denied” — which sends you debugging IAM when the real problem is the model choice for your deployment region.

What is the Converse message-ordering problem?

The Converse API requires messages to alternate user/assistant and start with the user. Direct “messages” APIs are often lenient about this; Converse is not. Two consecutive same-role turns, or a history that does not start with the user, return a server error in production — invisible in unit tests if your fixtures are already clean.

AWS Bedrock Migration: 3 Production Gotchas | Talki Academy

Q: What is a cross-region inference profile and why does it break IAM?

For capacity and resilience, Bedrock can route a request across multiple regions via an inference profile. Your IAM policy must then allow invoking the model in each of those regions, not just the primary one. Otherwise you get intermittent access-denied — only when routing lands on an unauthorized region, which is very hard to reproduce.

Q: How do you avoid all three?

Test on a bench that replays a real conversation history, in your real deployment region. Verify the model is available in that region, grant invocation across all regions of the inference profile from the start, and normalize message order before the call (merge consecutive same-role turns, guarantee a first user message).

Moving your LLM calls to a managed service like AWS Bedrock — often for data residency — looks trivial: “same kind of API, just change the endpoint.” In practice, three differences cost us production incidents. They are less bugs than implicit contracts you only discover under real load. Here they are, to spare you.

Context (deliberately generic): a service that called an LLM via a direct API, migrated to a managed model on Bedrock, in a primary region (call it region A) with an inference profile that can route to secondary regions (regions B).

Gotcha 1 — Model availability is per-region

A model identifier that works in one region can return “not_found” in another. Bedrock does not expose the same catalog everywhere. The service worked in development (region A) and failed in production (another region) — with a “model not found” error.

The real trap is the type of error: “not found,” not “access denied.” You then go debug IAM for an hour, when the problem is the model choice for the region. Lesson: verify model availability in your deployment region before writing a single line, and pick a model actually served where you run.

Gotcha 2 — Cross-region inference needs multi-region permissions

For capacity and resilience, Bedrock offers inference profiles that route a request across multiple regions. Great for availability — a trap for IAM: your policy must allow invoking the model in each of those regions, not just the primary.

If you only grant the home region, you get intermittent access-denied: it appears only when the profile routes to an unauthorized region. So it is impossible to reproduce reliably and a nightmare to diagnose (“it works 9 times out of 10”). Lesson: grant invocation across all regions of the inference profile from the start, even those you think you will never use.

Gotcha 3 — The Converse API enforces strict message ordering

Direct “messages” APIs are often lenient about order. Converse is not: messages must alternate user / assistant and start with the user. Two consecutive same-role turns, or a history that does not start on the user side → server error in production.

Rejected by Converse:
  [assistant] "Hi!"             <- does not start with 'user'
  [user]      "..."
  [user]      "..."             <- two consecutive 'user'

Accepted:
  [user]      "..."
  [assistant] "..."
  [user]      "..."             <- strict alternation, starts with 'user'

The sneaky part: it is invisible in unit tests if your fixtures are already clean, and it explodes as soon as a real conversation history arrives (resumes, misplaced system messages, client-merged turns). Lesson: normalize order before the call — merge consecutive same-role turns and guarantee a first user message.

Recap

Implicit contract	Production symptom	Countermeasure
Per-region catalog	“not_found” (false IAM lead)	Verify model availability in the target region
Multi-region IAM	Intermittent access-denied	Grant invocation across all profile regions
Strict ordering (Converse)	Server error on real history	Normalize message order before the call

Takeaway

Migrating to a managed LLM is not changing a URL. It is accepting three different contracts: a per-region catalog, multi-region IAM, and a strict conversation format. All three surface in production if you do not anticipate them — so test them on a bench that replays a real history, in your real region.

Going further on model and serving trade-offs: self-hosted vs API TCO, LLM API comparison and which serving format in production.

Migrating to a Managed LLM (Bedrock): 3 Implicit Contracts That Break in Prod