Question 1

What's the difference between priority and preemption?

Accepted Answer

Priority is a *ranking* — every pod has an integer priority value (default 0) that the scheduler uses when deciding the order of pending pods to admit. Preemption is the *action* that priority enables — when the scheduler can't fit a high-priority pod on any node because lower-priority pods are using the resources, it can evict (preempt) one or more of those lower-priority pods to make room. Priority always works (it just changes queue order); preemption only fires when the scheduler determines that evicting specific lower-priority pods would let the new pod schedule.

Question 2

What is `preemptionPolicy: Never` and when should I use it?

Accepted Answer

It's a per-PriorityClass setting that turns off the *eviction* part of preemption while keeping the *ranking* part. A pod with `preemptionPolicy: Never` and a high priority value will be admitted ahead of low-priority pods at the front of the Pending queue, but it will *not* evict any running lower-priority pod to make room. If the cluster is full, it just waits. Use this for high-priority workloads that are still tolerant of waiting (long batch training that can run overnight) when disruption to lower-priority work would be worse than the wait. Most teams set it incorrectly — they want disruption-free *ordering* but accidentally turn off the entire preemption mechanism.

Question 3

How does the scheduler pick which lower-priority pod to evict?

Accepted Answer

The scheduler picks victim pods to minimize disruption while still making room. It considers: lower-priority first (starting with PriorityClass values lower than the incoming pod's, in ascending order), then PodDisruptionBudgets (avoid evicting pods if it would violate a PDB), then minimizing the number of evicted pods (sometimes just one is enough). Pods running with the system PriorityClasses `system-cluster-critical` (2000000000) and `system-node-critical` (2000001000) are essentially never preempted — those values are above any reasonable user-defined priority and protect things like the kubelet, control-plane components, and DNS.

Question 4

Why would my high-priority GPU pod stay Pending even with preemption enabled?

Accepted Answer

Three common reasons: (1) the pods occupying GPU capacity are at *equal or higher* priority — the scheduler won't preempt them. (2) Your pod's request can't be satisfied even after evicting all evictable low-priority pods (e.g., asking for 4 GPUs when the cluster has 2 GPUs of low-priority work and 2 of medium-priority — only the 2 low get evicted, you still need 2 more). (3) `preemptionPolicy: Never` is set on your PriorityClass — you got the priority *ordering* but explicitly turned off eviction. Diagnose with `kubectl describe pod` and look at the scheduler events for the explanation.

PriorityClass & Preemption — Who Survives the GPU Squeeze

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

What's the difference between priority and preemption?

What is `preemptionPolicy: Never` and when should I use it?

How does the scheduler pick which lower-priority pod to evict?

Why would my high-priority GPU pod stay Pending even with preemption enabled?

PriorityClass & Preemption — Who Survives the GPU Squeeze

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

What's the difference between priority and preemption?

What is preemptionPolicy: Never and when should I use it?

How does the scheduler pick which lower-priority pod to evict?

Why would my high-priority GPU pod stay Pending even with preemption enabled?

What is `preemptionPolicy: Never` and when should I use it?