Question 1

What's the difference between a resource request and a resource limit?

Accepted Answer

A request is what the *scheduler* uses — it's a reservation. The scheduler sums all pods' requests against each node's allocatable resources to decide where to place new pods, and against the namespace ResourceQuota to decide whether to admit them at all. A limit is what the *kubelet and Linux kernel* enforce at runtime via cgroups. CPU limits cause throttling (paused process); memory limits cause OOM-kill (terminated process). Requests determine placement; limits determine in-the-moment behavior.

Question 2

Should I always set CPU limits on my pods?

Accepted Answer

No, and this is one of the most contested topics in production Kubernetes operation. CPU limits are enforced via Linux CFS bandwidth control, which can throttle a process *even when the node has idle CPU available*. For latency-sensitive workloads (HTTP APIs, gateways, real-time services), production teams routinely set CPU requests but *omit* CPU limits — the request gives the scheduler what it needs, while letting the pod use any spare CPU when available. Memory limits, on the other hand, should always be set, because the alternative is uncontrolled memory growth that triggers node-wide eviction. The lab demonstrates the throttling mechanism so you can make the call yourself.

Question 3

Why is nvidia.com/gpu integer-only when CPU and memory are fractional?

Accepted Answer

GPUs are an extended resource in Kubernetes' resource model, and extended resources must be integers — the API server rejects fractional values. The reason isn't laziness: a CUDA context can't be safely split at the K8s API layer. If you want fractional GPU sharing, you need an explicit operator-level mechanism — MIG (hardware partitioning), time-slicing (software multiplexing without isolation), or MPS (NVIDIA's multi-process scheduler). Each is configured at the GPU operator before pods are scheduled, then advertises its partitions as separate resource types. We'll cover each in dedicated later labs.

Question 4

What QoS class should production workloads aim for?

Accepted Answer

Guaranteed for almost anything important. The kubelet evicts pods in a strict order under node pressure: BestEffort first, then Burstable (with the noisiest neighbors going first), then Guaranteed. Guaranteed pods almost never get evicted unless the kubelet itself is dying. Achieving Guaranteed requires every container in the pod to have CPU and memory requests and limits set, with each request equal to its corresponding limit. For GPU jobs and stateful workloads this is non-negotiable — the cost of being evicted mid-training is much worse than the cost of CPU throttling.

Kubernetes Resource Requests & Limits — Who Gets What, and Who Survives

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

What's the difference between a resource request and a resource limit?

Should I always set CPU limits on my pods?

Why is nvidia.com/gpu integer-only when CPU and memory are fractional?

What QoS class should production workloads aim for?