GPU Container Lifecycle: Build, Test, Ship, Rollback
Walk through the full lifecycle of a production GPU container — multi-stage Dockerfile, self-hosted GPU CI, a fail-fast smoke test, and a Kubernetes Deployment with readiness probes gated on real GPU compute. The pipeline that stops bad images before users see a 500.
What you'll learn
- 1The production Dockerfile
- 2The CI workflow
- 3The GPU smoke test
- 4Kubernetes rollout + rollback
Prerequisites
- Comfortable with Docker multi-stage builds and Dockerfile directives
- Basic Kubernetes (Deployments, probes, nodeSelector)
- Familiarity with CI/CD concepts (GitHub Actions or equivalent)
Exam domains covered
Skills & technologies you'll practice
This intermediate-level gpu lab gives you real-world reps across:
What you'll build in this GPU container lifecycle lab
Shipping a GPU container to production is where most AI teams first discover that docker build + docker push is not a deploy pipeline. This lab gives you the full pattern — multi-stage Dockerfile, self-hosted GPU CI, a GPU-aware smoke test, Kubernetes Deployment with probes gated on real compute, and a rollout/rollback playbook — so the next time a driver mismatch, a cuDNN drift, or a silent VRAM leak tries to reach users, it hits a wall instead. You'll leave with a runnable Dockerfile on nvidia/cuda:*-runtime-ubuntu22.04, a GitHub Actions workflow targeting a self-hosted, gpu runner, a three-check smoke test with distinct exit codes, a Deployment YAML with maxUnavailable: 0 / maxSurge: 1 and both probes, and a concrete mental model of why four layers of essentially-the-same-test is the feature, not the bug. ~40 minutes on a real NVIDIA GPU pod we hand you; no local Docker, no kubectl context juggling.
The technical backbone is defense in depth on the same question — can this container actually use a GPU right now? — asked from four places. CI runs the smoke test on a self-hosted GPU runner (GitHub-hosted runners have no GPUs, so docker run --gpus all silently no-ops and your pipeline lies to you). Dockerfile HEALTHCHECK invokes python3 /app/smoke_test.py — NVML init, torch.cuda.is_available(), tensor round-trip — not curl /healthz, because HTTP 200 proves Python is running, not that CUDA is. The readiness probe reuses the same script to gate traffic on a pod that booted but can't allocate VRAM. The liveness probe reruns it continuously to catch the slow-motion failures — Xid fatal errors, thermal throttling, memory fragmentation, cuDNN version drift, kernel hangs — that CI on a cold GPU in a short run will never see. maxUnavailable: 0 + maxSurge: 1 gives you zero-downtime rolling updates on GPU-constrained clusters without deadlocking.
Prereqs: Docker multi-stage builds, basic Kubernetes (Deployments, probes, nodeSelector, resources.limits), and CI/CD concepts (GitHub Actions or equivalent). Preinstalled on the lab pod: Docker, NVIDIA Container Toolkit, kubectl, PyTorch, and CUDA. Grading checks the artifacts the way a reviewer would: the Dockerfile must have ≥2 FROM stages, a non-root USER, and a GPU-aware HEALTHCHECK; the workflow must declare build + test jobs on a GPU runner with a push stage gated on main; the smoke test must exit 0 on healthy and non-zero when CUDA_VISIBLE_DEVICES=''; the Deployment must declare both probes, nvidia.com/gpu, and the rolling-update knobs. The reflection step asks you to instrument past the four layers — DCGM DCGM_FI_DEV_XID_ERRORS, SM_ACTIVE, a model-shaped forward-pass probe — which is how you graduate from 'ships clean' to 'stays clean at 3am'.
Frequently asked questions
Why target runs-on: [self-hosted, gpu] instead of a GitHub-hosted runner?
runs-on: [self-hosted, gpu] instead of a GitHub-hosted runner?docker run --gpus all test step would either skip or silently pass. You either register a self-hosted runner on a GPU host, use a managed GPU CI provider (BuildJet, Actuated, Namespace), or rely on GitHub's large-runner GPU tier where available. The lab shows the self-hosted, gpu pattern because it's the most portable and because it teaches you to separate the 'where does my image build' question from the 'where does my image test' question — in production those often run on very different hardware.Should the HEALTHCHECK in the Dockerfile call curl http://localhost:8000/healthz?
HEALTHCHECK in the Dockerfile call curl http://localhost:8000/healthz?HEALTHCHECK should invoke a GPU-aware probe — python3 /app/smoke_test.py that calls torch.cuda.is_available(), initializes NVML, and allocates a small tensor — so Docker marks the container unhealthy when the driver, the toolkit, or the card itself fail. Your Kubernetes readiness probe reuses the exact same script.Why distinguish readiness from liveness when both call the same smoke test?
failureThreshold) so transient Xid errors or a briefly-stuck kernel don't thrash your pods into a CrashLoopBackOff.What can go wrong at runtime that CI and HEALTHCHECK won't catch?
HEALTHCHECK won't catch?Why use maxUnavailable: 0, maxSurge: 1 for the rolling update?
maxUnavailable: 0, maxSurge: 1 for the rolling update?maxUnavailable: 0 guarantees the old pod keeps serving traffic until the new one passes readiness, so you never drop below full capacity. maxSurge: 1 says you're willing to temporarily run one extra replica during the rollout — critical because without surge, with maxUnavailable: 0, the rollout would deadlock waiting for a pod that can't start until another pod dies. The combination gives you zero-downtime deploys on constrained GPU nodes.How does grading work for the Kubernetes step?
deployment_yaml string and checks for the required fields: apiVersion, kind: Deployment, strategy.rollingUpdate with both maxUnavailable and maxSurge, nvidia.com/gpu in resource limits, and both readinessProbe and livenessProbe. It also inspects your rollback_commands list for kubectl rollout history and undo, and verifies your image_promotion_flow defines at least three stages (dev → staging → prod) each with env, tag_pattern, and gate keys. Nothing is applied to a live cluster — the lab grades the artifacts, not a running rollout.