Question 1

What's the difference between rolling update, blue-green, and canary?

Accepted Answer

All three are ways to replace v1 pods with v2 pods. **Rolling update** does it gradually within ONE Deployment — Kubernetes tears down old pods and brings up new ones simultaneously, with overlap controlled by `maxSurge` (how many extra pods may exist) and `maxUnavailable` (how many old pods may be torn down before replacements are Ready). **Blue-green** uses TWO Deployments running in parallel; a Service points at one of them via label selector, and you cut over by editing the Service's selector — atomic, but requires double the capacity during cutover. **Canary** also uses TWO Deployments behind one Service, but the traffic split comes from the *replica counts* (e.g., 9 v1 pods + 1 v2 pod = 10% canary). The right choice depends on your tolerance for partial-failure exposure during rollout (rolling = always exposing some fraction, canary = explicit small fraction, blue-green = atomic but expensive).

Question 2

Why does my Deployment hand traffic to pods that aren't ready?

Accepted Answer

Because you didn't set a `readinessProbe`. Without one, the kubelet considers a pod "Ready" as soon as the container starts — *not* when the application inside is actually serving. For an inference server that takes 30 seconds to load model weights, that 30-second window is when Kubernetes is happily routing requests to a pod that's still doing `from_pretrained()` and returning 500s. The fix: every Deployment template needs a `readinessProbe` that genuinely tests the application — `httpGet` on a `/health` or `/v1/models/` endpoint, an `exec` checking that a model file exists, or a `tcpSocket` test against the inference port (the weakest of the three but still better than nothing). The Service routes only to pods with `Ready=True`; with a real probe, that's an actual application-level check.

Question 3

When should I use blue-green over rolling update?

Accepted Answer

Blue-green when you need an **atomic cut** with no overlap window where v1 and v2 are simultaneously live. Common use cases: schema-incompatible model changes (a new tokenizer that produces different inputs to the model), database migration in lockstep with the deployment, regulated environments where mixed-version operation is forbidden. The trade-off is capacity: during the cutover you're running `2N` pods (the full v1 + the full v2). Rolling update only needs `N + maxSurge` pods. For an N=20 inference fleet on expensive GPUs, blue-green doubles your GPU bill for the rollout window — significant. Most production teams reach for rolling update by default and only escalate to blue-green when explicitly required.

Question 4

How do I roll back a bad deployment?

Accepted Answer

`kubectl rollout undo deployment/` reverts to the previous revision. The Deployment controller keeps the last `revisionHistoryLimit` ReplicaSets (default 10) — they're the "saved games" you can roll back to. `kubectl rollout history deployment/` lists them; `kubectl rollout undo --to-revision=N` reverts to a specific one. The rollback is itself a rolling update — Kubernetes scales the previous ReplicaSet back up while scaling the current one down. This is why `revisionHistoryLimit` matters: with 0, you have no rollback target; with the default 10, you can rewind through the last ten releases. For production inference, the right pattern is: rollout → smoke test (5-30 seconds of traffic) → if metrics look bad, `kubectl rollout undo` immediately. The decision-to-recovery loop is under a minute when you've practiced it.

Rolling Updates, Rollback & Blue-Green for AI Inference

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

What's the difference between rolling update, blue-green, and canary?

Why does my Deployment hand traffic to pods that aren't ready?

When should I use blue-green over rolling update?

How do I roll back a bad deployment?