PriorityClass & Preemption — Who Survives the GPU Squeeze
Hosted
Beta

PriorityClass & Preemption — Who Survives the GPU Squeeze

When GPU capacity is full and a critical training job lands, who wins? This lab builds the mental model behind PriorityClass and preemption — the only mechanism Kubernetes gives you for resolving GPU contention with intent rather than first-come-first-served. Includes the `preemptionPolicy: Never` escape hatch most teams misuse.

35 min·5 steps·2 domains·Intermediate·nca-aiioncp-aiincp-aio

What you'll learn

  1. 1
    The contention problem — what happens when GPUs run out
    You're sharing a 4-GPU cluster with three teams. By 2pm, all 4 GPUs are running long batch jobs. A critical inference deployment ships at 3pm and needs 1 GPU. *What happens?*
  2. 2
    Reading PriorityClasses — value, preemptionPolicy, globalDefault
    A PriorityClass is a tiny cluster-scoped object — three fields do all the work. Understanding what each one controls is the difference between writing a correct policy and writing one that breaks the wrong way.
  3. 3
    Preemption in action — watch the scheduler evict a low-pri pod
    The cluster is already full. The lab platform pre-deployed 4 low-priority batch pods (batch-1 through batch-4), each holding 1 GPU. The node's 4 GPUs are 100% allocated. Run kubectl get pods to see them.
  4. 4
    preemptionPolicy: Never — high priority, polite waiting
    Sometimes a workload is genuinely high-priority but you *don't* want it to evict running pods. Common cases:
  5. 5
    Triage Day — three pods broken in priority-related ways
    The cluster is full again — 4 low-priority batch-* pods are using all 4 GPUs. Three high-priority pods have just been deployed, but each has a different priority misconfiguration, so all three are Pending instead of Running.

Prerequisites

  • Completed `nca-aiio-resource-requests-limits` (or comfortable with QoS classes, ResourceQuota, and pod admission)
  • Familiar with `kubectl get pods -w`, `kubectl describe pod`, and reading scheduler events

Exam domains covered

AI Infrastructure & OperationsGPU Acceleration & Distributed Training

Skills & technologies you'll practice

This intermediate-level ai/ml lab gives you real-world reps across:

PriorityClassPreemptionSchedulerGPU SchedulingMulti-tenantNCA-AIIOKubernetesTriage

What you'll learn

PriorityClass and preemption are how production teams resolve GPU contention at the Kubernetes scheduler level. Without them, the only outcome of a full cluster is a Pending pod queue ordered by submission time — there's no notion of "this training job matters more than that throwaway notebook." This lab teaches the complete model: how Kubernetes uses an integer priority value to rank pods, how the scheduler decides which low-priority pods to evict to make room for a high-priority pod, why preemptionPolicy: Never is what you use when you want priority ranking without disruption, and the failure modes when a pod references a PriorityClass that doesn't exist.

By the end you'll have authored three PriorityClasses (low, medium, high), watched the scheduler preempt a low-priority pod when GPU capacity is exhausted, and diagnosed three pods broken in priority-related ways. This is the daily work of a platform engineer running a multi-tenant GPU cluster, and it's a major topic in the NCA-AIIO exam's AI Infrastructure & Operations domain.

Frequently asked questions

What's the difference between priority and preemption?

Priority is a ranking — every pod has an integer priority value (default 0) that the scheduler uses when deciding the order of pending pods to admit. Preemption is the action that priority enables — when the scheduler can't fit a high-priority pod on any node because lower-priority pods are using the resources, it can evict (preempt) one or more of those lower-priority pods to make room. Priority always works (it just changes queue order); preemption only fires when the scheduler determines that evicting specific lower-priority pods would let the new pod schedule.

What is preemptionPolicy: Never and when should I use it?

It's a per-PriorityClass setting that turns off the eviction part of preemption while keeping the ranking part. A pod with preemptionPolicy: Never and a high priority value will be admitted ahead of low-priority pods at the front of the Pending queue, but it will not evict any running lower-priority pod to make room. If the cluster is full, it just waits. Use this for high-priority workloads that are still tolerant of waiting (long batch training that can run overnight) when disruption to lower-priority work would be worse than the wait. Most teams set it incorrectly — they want disruption-free ordering but accidentally turn off the entire preemption mechanism.

How does the scheduler pick which lower-priority pod to evict?

The scheduler picks victim pods to minimize disruption while still making room. It considers: lower-priority first (starting with PriorityClass values lower than the incoming pod's, in ascending order), then PodDisruptionBudgets (avoid evicting pods if it would violate a PDB), then minimizing the number of evicted pods (sometimes just one is enough). Pods running with the system PriorityClasses system-cluster-critical (2000000000) and system-node-critical (2000001000) are essentially never preempted — those values are above any reasonable user-defined priority and protect things like the kubelet, control-plane components, and DNS.

Why would my high-priority GPU pod stay Pending even with preemption enabled?

Three common reasons: (1) the pods occupying GPU capacity are at equal or higher priority — the scheduler won't preempt them. (2) Your pod's request can't be satisfied even after evicting all evictable low-priority pods (e.g., asking for 4 GPUs when the cluster has 2 GPUs of low-priority work and 2 of medium-priority — only the 2 low get evicted, you still need 2 more). (3) preemptionPolicy: Never is set on your PriorityClass — you got the priority ordering but explicitly turned off eviction. Diagnose with kubectl describe pod and look at the scheduler events for the explanation.