Rolling Updates, Rollback & Blue-Green for AI Inference
Ship a new model version without dropping requests. Master Kubernetes' three deployment strategies — rolling update with readiness probes, rollback after a bad release, and blue-green via Service-selector swap — all on a stand-in inference Deployment.
What you'll learn
- 1Three deployment strategies and the readiness-probe contractYou already know how to author a Deployment (last lab). Now you're going to *update* one — replace v1 with v2 — without dropping inference traffic. Kubernetes gives you three native strategies for this. Picking the right one depends on what your workload tolerates during the cutover window.
- 2Rolling update — v1 → v2 with a readiness probeYou're going to author an inference Deployment with 3 replicas and a readiness probe that simulates "model is loading", then update its VERSION from v1 to v2 and watch the rolling update happen pod-by-pod.
- 3Bad rollout — abort and rollback with `kubectl rollout undo`You're about to ship inference v3. It has a regression — the container's startup script forgets to touch /tmp/ready, so its readiness probe will never pass. In production, that would correspond to a model load that hangs forever, or a from_pretrained() that throws and never recovers.
- 4Blue-green — atomic cutover via Service selector swapTwo inference Deployments are already running side-by-side: inference-blue (v1) and inference-green (v2). Both have 2 replicas and identical pod-template shapes — the only difference is the color label and the VERSION env var (v1-blue vs v2-green).
- 5Triage Day — three rollout-related failuresThe lab platform has just deployed three broken rollout scenarios — one per strategy you covered in earlier steps. Each is a real production failure shape:
Prerequisites
- Completed `nca-aiio-workload-controllers` (or comfortable with Deployment + Service)
- Familiar with `kubectl describe`, reading pod events, and the rolling-update concept
Exam domains covered
Skills & technologies you'll practice
This intermediate-level ai/ml lab gives you real-world reps across:
What you'll learn
Shipping a new model version is a more dangerous action than most engineers realize. Without the right deployment strategy, a single bad release can drop all your inference traffic for the duration of the rollout, route requests to pods that are still loading their model, or — worst — leave the cluster in a half-rolled-out state with no clear recovery path. This lab teaches the three Kubernetes-native answers: rolling update (the default — replaces pods one at a time with overlap controlled by maxSurge/maxUnavailable), rollback (reverting a bad release with one command via kubectl rollout undo), and blue-green (running v1 and v2 side by side, swapping the Service's selector for an atomic cut).
Each strategy has the same building blocks — Deployments, ReplicaSets, Services, readiness probes — but combines them in different orders. By the end you'll have updated an inference Deployment from v1 to v2, watched a bad v3 rollout get stuck and reverted with one command, and run a v1/v2 blue-green cut by editing a Service's selector. You finish with a Triage Day where each strategy is broken in a strategy-specific way, and you fix each by reading the cluster's state.
Frequently asked questions
What's the difference between rolling update, blue-green, and canary?
maxSurge (how many extra pods may exist) and maxUnavailable (how many old pods may be torn down before replacements are Ready). Blue-green uses TWO Deployments running in parallel; a Service points at one of them via label selector, and you cut over by editing the Service's selector — atomic, but requires double the capacity during cutover. Canary also uses TWO Deployments behind one Service, but the traffic split comes from the replica counts (e.g., 9 v1 pods + 1 v2 pod = 10% canary). The right choice depends on your tolerance for partial-failure exposure during rollout (rolling = always exposing some fraction, canary = explicit small fraction, blue-green = atomic but expensive).Why does my Deployment hand traffic to pods that aren't ready?
readinessProbe. Without one, the kubelet considers a pod "Ready" as soon as the container starts — not when the application inside is actually serving. For an inference server that takes 30 seconds to load model weights, that 30-second window is when Kubernetes is happily routing requests to a pod that's still doing from_pretrained() and returning 500s. The fix: every Deployment template needs a readinessProbe that genuinely tests the application — httpGet on a /health or /v1/models/ endpoint, an exec checking that a model file exists, or a tcpSocket test against the inference port (the weakest of the three but still better than nothing). The Service routes only to pods with Ready=True; with a real probe, that's an actual application-level check.When should I use blue-green over rolling update?
2N pods (the full v1 + the full v2). Rolling update only needs N + maxSurge pods. For an N=20 inference fleet on expensive GPUs, blue-green doubles your GPU bill for the rollout window — significant. Most production teams reach for rolling update by default and only escalate to blue-green when explicitly required.How do I roll back a bad deployment?
kubectl rollout undo deployment/<name> reverts to the previous revision. The Deployment controller keeps the last revisionHistoryLimit ReplicaSets (default 10) — they're the "saved games" you can roll back to. kubectl rollout history deployment/<name> lists them; kubectl rollout undo --to-revision=N reverts to a specific one. The rollback is itself a rolling update — Kubernetes scales the previous ReplicaSet back up while scaling the current one down. This is why revisionHistoryLimit matters: with 0, you have no rollback target; with the default 10, you can rewind through the last ten releases. For production inference, the right pattern is: rollout → smoke test (5-30 seconds of traffic) → if metrics look bad, kubectl rollout undo immediately. The decision-to-recovery loop is under a minute when you've practiced it.