Question 1

Why use a StatefulSet instead of a Deployment for distributed training?

Accepted Answer

Three reasons unique to StatefulSet. (1) **Stable network identity** — each pod gets a predictable DNS name like `trainer-0.trainer-svc.default.svc.cluster.local` (paired with a *headless* Service). Distributed training frameworks (PyTorch DDP, NCCL all-reduce, Horovod) need every worker to discover every other worker by hostname; random Deployment pod names make that brittle. (2) **Ordered creation and termination** — `trainer-0` becomes Ready before `trainer-1` starts. The rank-0 worker is conventionally the coordinator; ordered start ensures it's up before the others try to reach it. (3) **Per-pod PVCs via `volumeClaimTemplates`** — each pod gets its own PVC automatically (e.g., `data-trainer-0`, `data-trainer-1`), so per-rank checkpoints don't collide. None of these guarantees come with a Deployment.

Question 2

When should I NOT use a StatefulSet?

Accepted Answer

For stateless replicas — model inference servers, API gateways, function workers — Deployment is correct. StatefulSet's ordered creation is *slower* on scale-up (you wait for each pod to be Ready before the next starts), and pod identity is overhead you don't need. The rule of thumb: do your replicas have any requirement beyond "be one of N identical workers"? If yes (stable hostname, ordered start, per-pod storage), StatefulSet. If no, Deployment. The most common "wrong choice" is using a StatefulSet for inference because someone wanted "stable pod names" — which they almost certainly don't need.

Question 3

What workloads should be DaemonSets?

Accepted Answer

Anything that should run **exactly one pod per matching node** for the lifetime of that node. The classic AI examples: `nvidia-dcgm-exporter` (GPU metrics — one per GPU node), `nvidia-device-plugin` (advertises `nvidia.com/gpu` capacity — one per GPU node), `gpu-feature-discovery` (writes node labels — one per GPU node), node-local model caches that pre-pull weights, log/trace collectors. DaemonSets target nodes via `nodeSelector` or `nodeAffinity` and *automatically schedule a new pod* when matching nodes join the cluster. They're also exempt from default `kube-scheduler` priority — DaemonSet pods bind directly to specific nodes via the `daemonset` controller, not through the regular scheduling path.

Question 4

Why is my DaemonSet missing pods on some nodes?

Accepted Answer

Three common reasons. (1) **`nodeSelector` doesn't match** — your spec targets `nvidia.com/gpu.product=NVIDIA-A100` but those nodes have `Tesla-K80`. Run `kubectl get nodes --show-labels | grep nvidia` to see what's actually labeled. (2) **Node taints with no matching toleration** — control-plane nodes typically have `node-role.kubernetes.io/control-plane:NoSchedule`; your DaemonSet needs a toleration to run there. The `nvidia.com/gpu=present:NoSchedule` taint is similarly common in production GPU pools. (3) **Pod resource requests exceed the node's allocatable** — DaemonSet pods compete for resources the same way; if your pod requests `8Gi` memory and the node has `7Gi` allocatable, the pod stays Pending. Diagnose with `kubectl describe ds ` (status fields show desired/current/numberReady) and `kubectl describe pod ` (events explain the per-node failure).

Workload Controllers — Deployment, StatefulSet, DaemonSet for AI

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

Why use a StatefulSet instead of a Deployment for distributed training?

When should I NOT use a StatefulSet?

What workloads should be DaemonSets?

Why is my DaemonSet missing pods on some nodes?