Rolling Updates, Rollback & Blue-Green for AI Inference
Hosted
Beta

Rolling Updates, Rollback & Blue-Green for AI Inference

Ship a new model version without dropping requests. Master Kubernetes' three deployment strategies — rolling update with readiness probes, rollback after a bad release, and blue-green via Service-selector swap — all on a stand-in inference Deployment.

35 min·5 steps·2 domains·Intermediate·nca-aiioncp-aiincp-aio

What you'll learn

  1. 1
    Three deployment strategies and the readiness-probe contract
    You already know how to author a Deployment (last lab). Now you're going to *update* one — replace v1 with v2 — without dropping inference traffic. Kubernetes gives you three native strategies for this. Picking the right one depends on what your workload tolerates during the cutover window.
  2. 2
    Rolling update — v1 → v2 with a readiness probe
    You're going to author an inference Deployment with 3 replicas and a readiness probe that simulates "model is loading", then update its VERSION from v1 to v2 and watch the rolling update happen pod-by-pod.
  3. 3
    Bad rollout — abort and rollback with `kubectl rollout undo`
    You're about to ship inference v3. It has a regression — the container's startup script forgets to touch /tmp/ready, so its readiness probe will never pass. In production, that would correspond to a model load that hangs forever, or a from_pretrained() that throws and never recovers.
  4. 4
    Blue-green — atomic cutover via Service selector swap
    Two inference Deployments are already running side-by-side: inference-blue (v1) and inference-green (v2). Both have 2 replicas and identical pod-template shapes — the only difference is the color label and the VERSION env var (v1-blue vs v2-green).
  5. 5
    Triage Day — three rollout-related failures
    The lab platform has just deployed three broken rollout scenarios — one per strategy you covered in earlier steps. Each is a real production failure shape:

Prerequisites

  • Completed `nca-aiio-workload-controllers` (or comfortable with Deployment + Service)
  • Familiar with `kubectl describe`, reading pod events, and the rolling-update concept

Exam domains covered

AI Infrastructure & OperationsWorkload Management

Skills & technologies you'll practice

This intermediate-level ai/ml lab gives you real-world reps across:

Rolling UpdateBlue-GreenCanaryRollbackReadiness ProbeDeployment StrategiesNCA-AIIOKubernetesTriage

What you'll learn

Shipping a new model version is a more dangerous action than most engineers realize. Without the right deployment strategy, a single bad release can drop all your inference traffic for the duration of the rollout, route requests to pods that are still loading their model, or — worst — leave the cluster in a half-rolled-out state with no clear recovery path. This lab teaches the three Kubernetes-native answers: rolling update (the default — replaces pods one at a time with overlap controlled by maxSurge/maxUnavailable), rollback (reverting a bad release with one command via kubectl rollout undo), and blue-green (running v1 and v2 side by side, swapping the Service's selector for an atomic cut).

Each strategy has the same building blocks — Deployments, ReplicaSets, Services, readiness probes — but combines them in different orders. By the end you'll have updated an inference Deployment from v1 to v2, watched a bad v3 rollout get stuck and reverted with one command, and run a v1/v2 blue-green cut by editing a Service's selector. You finish with a Triage Day where each strategy is broken in a strategy-specific way, and you fix each by reading the cluster's state.

Frequently asked questions

What's the difference between rolling update, blue-green, and canary?

All three are ways to replace v1 pods with v2 pods. Rolling update does it gradually within ONE Deployment — Kubernetes tears down old pods and brings up new ones simultaneously, with overlap controlled by maxSurge (how many extra pods may exist) and maxUnavailable (how many old pods may be torn down before replacements are Ready). Blue-green uses TWO Deployments running in parallel; a Service points at one of them via label selector, and you cut over by editing the Service's selector — atomic, but requires double the capacity during cutover. Canary also uses TWO Deployments behind one Service, but the traffic split comes from the replica counts (e.g., 9 v1 pods + 1 v2 pod = 10% canary). The right choice depends on your tolerance for partial-failure exposure during rollout (rolling = always exposing some fraction, canary = explicit small fraction, blue-green = atomic but expensive).

Why does my Deployment hand traffic to pods that aren't ready?

Because you didn't set a readinessProbe. Without one, the kubelet considers a pod "Ready" as soon as the container starts — not when the application inside is actually serving. For an inference server that takes 30 seconds to load model weights, that 30-second window is when Kubernetes is happily routing requests to a pod that's still doing from_pretrained() and returning 500s. The fix: every Deployment template needs a readinessProbe that genuinely tests the application — httpGet on a /health or /v1/models/ endpoint, an exec checking that a model file exists, or a tcpSocket test against the inference port (the weakest of the three but still better than nothing). The Service routes only to pods with Ready=True; with a real probe, that's an actual application-level check.

When should I use blue-green over rolling update?

Blue-green when you need an atomic cut with no overlap window where v1 and v2 are simultaneously live. Common use cases: schema-incompatible model changes (a new tokenizer that produces different inputs to the model), database migration in lockstep with the deployment, regulated environments where mixed-version operation is forbidden. The trade-off is capacity: during the cutover you're running 2N pods (the full v1 + the full v2). Rolling update only needs N + maxSurge pods. For an N=20 inference fleet on expensive GPUs, blue-green doubles your GPU bill for the rollout window — significant. Most production teams reach for rolling update by default and only escalate to blue-green when explicitly required.

How do I roll back a bad deployment?

kubectl rollout undo deployment/<name> reverts to the previous revision. The Deployment controller keeps the last revisionHistoryLimit ReplicaSets (default 10) — they're the "saved games" you can roll back to. kubectl rollout history deployment/<name> lists them; kubectl rollout undo --to-revision=N reverts to a specific one. The rollback is itself a rolling update — Kubernetes scales the previous ReplicaSet back up while scaling the current one down. This is why revisionHistoryLimit matters: with 0, you have no rollback target; with the default 10, you can rewind through the last ten releases. For production inference, the right pattern is: rollout → smoke test (5-30 seconds of traffic) → if metrics look bad, kubectl rollout undo immediately. The decision-to-recovery loop is under a minute when you've practiced it.