Workload Controllers — Deployment, StatefulSet, DaemonSet for AI
Three controller types, three workload shapes, three different production failure modes. Learn when to use a Deployment for inference, a StatefulSet for distributed training, and a DaemonSet for per-node GPU infrastructure — and how to spot when someone picked the wrong one.
What you'll learn
- 1The controller pattern — why bare pods aren't productionA bare Pod is a single container's life. The kubelet runs it; when its container exits, the pod stays in Succeeded (exit 0) or Failed (non-zero) — but it does not restart unless you explicitly write restartPolicy: Always, and even then the *pod* doesn't come back if it gets evicted, deleted, or its node disappears. There's nothing reconciling "I want one of these running" — the bare Pod is a one-shot.
- 2Deployment — the inference shapeA Deployment is the right controller for stateless, horizontally-scalable workloads. The canonical AI use case is inference: each replica is interchangeable, the Service load-balances requests across them, and a request can land on any pod. There's no notion of "request must go to replica 0" — that would defeat the point of horizontal scaling.
- 3StatefulSet — the distributed training shapeA StatefulSet is a controller for workloads where each replica has an *identity* — a stable name, a stable network address, and (typically) its own persistent volume. The canonical AI use case is distributed training: PyTorch DDP, NCCL all-reduce, Horovod, parameter-server architectures. Each rank needs to find its peers by hostname; rank 0 is the coordinator and starts first; per-rank checkpoints don't share storage.
- 4DaemonSet — the per-node infrastructure shapeA DaemonSet runs exactly one pod per matching node. There's no replicas field — the count comes from the cluster's nodes that satisfy the selector. The classic AI use cases are all node-scoped infrastructure:
- 5Triage Day — three controllers, three controller-specific failuresThe lab platform has just deployed three workload controllers — one of each kind — that are *all* broken in different ways. Each is broken at a layer specific to its controller type:
Prerequisites
- Completed `nca-aiio-resource-requests-limits` and `nca-aiio-priority-preemption` (or comfortable with pod requests, QoS, PriorityClass)
- Familiar with `kubectl get`, `describe`, and reading owner-references / ReplicaSet relationships
Exam domains covered
Skills & technologies you'll practice
This intermediate-level ai/ml lab gives you real-world reps across:
What you'll learn
Kubernetes ships with three workload controllers that account for ~95% of production AI workloads: Deployment for stateless inference, StatefulSet for distributed training pods that need stable identity and ordered start, DaemonSet for per-node infrastructure (DCGM exporters, GPU monitors, node-local model caches). Each one is a different shape with a different reconciliation contract — the controller chooses the right shape, not the engineer's preference. Picking the wrong one is one of the most common production mistakes; symptoms range from "my distributed trainer can't find its peers" to "my DaemonSet is missing nodes."
This lab teaches the three controllers from the angle that matters most for an NCA-AIIO platform engineer: which AI workload shape does each one fit? You'll author a Deployment for a stateless inference service, a StatefulSet for a 3-replica distributed-training pool with stable hostnames and per-pod PVCs, and a DaemonSet for a GPU-node monitoring agent. You finish with a Triage Day where each of the three controllers is broken in a controller-specific way you'd actually see in production, and you fix each by reading the cluster's response.
Frequently asked questions
Why use a StatefulSet instead of a Deployment for distributed training?
trainer-0.trainer-svc.default.svc.cluster.local (paired with a headless Service). Distributed training frameworks (PyTorch DDP, NCCL all-reduce, Horovod) need every worker to discover every other worker by hostname; random Deployment pod names make that brittle. (2) Ordered creation and termination — trainer-0 becomes Ready before trainer-1 starts. The rank-0 worker is conventionally the coordinator; ordered start ensures it's up before the others try to reach it. (3) Per-pod PVCs via volumeClaimTemplates — each pod gets its own PVC automatically (e.g., data-trainer-0, data-trainer-1), so per-rank checkpoints don't collide. None of these guarantees come with a Deployment.When should I NOT use a StatefulSet?
What workloads should be DaemonSets?
nvidia-dcgm-exporter (GPU metrics — one per GPU node), nvidia-device-plugin (advertises nvidia.com/gpu capacity — one per GPU node), gpu-feature-discovery (writes node labels — one per GPU node), node-local model caches that pre-pull weights, log/trace collectors. DaemonSets target nodes via nodeSelector or nodeAffinity and automatically schedule a new pod when matching nodes join the cluster. They're also exempt from default kube-scheduler priority — DaemonSet pods bind directly to specific nodes via the daemonset controller, not through the regular scheduling path.Why is my DaemonSet missing pods on some nodes?
nodeSelector doesn't match — your spec targets nvidia.com/gpu.product=NVIDIA-A100 but those nodes have Tesla-K80. Run kubectl get nodes --show-labels | grep nvidia to see what's actually labeled. (2) Node taints with no matching toleration — control-plane nodes typically have node-role.kubernetes.io/control-plane:NoSchedule; your DaemonSet needs a toleration to run there. The nvidia.com/gpu=present:NoSchedule taint is similarly common in production GPU pools. (3) Pod resource requests exceed the node's allocatable — DaemonSet pods compete for resources the same way; if your pod requests 8Gi memory and the node has 7Gi allocatable, the pod stays Pending. Diagnose with kubectl describe ds <name> (status fields show desired/current/numberReady) and kubectl describe pod <ds-pod> (events explain the per-node failure).