Question 1

When should I use nodeSelector vs nodeAffinity?

Accepted Answer

`nodeSelector` is the simplest case: a single key=value match (or a few). If your only condition is "node has label `nvidia.com/gpu.product=NVIDIA-A100`", nodeSelector is one line and reads cleanly. `nodeAffinity` is the richer form — it supports operators like `In`, `NotIn`, `Exists`, `DoesNotExist`, `Gt`, `Lt`, lets you OR multiple terms together, and crucially has both a `requiredDuringSchedulingIgnoredDuringExecution` mode (hard match — same as nodeSelector) AND a `preferredDuringSchedulingIgnoredDuringExecution` mode (soft preference, with weights). Use nodeAffinity when you need OR semantics ("A100 OR H100"), set membership, or the soft-preference fallback ("prefer A100, but L40S is acceptable"). Stick with nodeSelector for the trivial hard match.

Question 2

What's the difference between taints and nodeSelector — they both block scheduling?

Accepted Answer

They run in opposite directions. `nodeSelector` is **on the pod** and says "this pod requires nodes with these labels"; pods opt-IN to specific nodes. `taints` are **on the node** and say "this node repels pods that don't have a matching toleration"; pods opt-IN by tolerating, but the *default* is to be excluded. The asymmetry matters in production: if your A100 nodes are expensive, you want the default to be "no random pod lands here" — that's taints. If you need workloads to discriminate but the hardware is shared freely, that's nodeSelector. Most production fleets use BOTH: taint A100 nodes (only A100-tolerating pods land here) AND select with `nodeSelector` (only pods explicitly targeting A100 land here).

Question 3

What does `requiredDuringSchedulingIgnoredDuringExecution` actually do?

Accepted Answer

It's a hard match enforced at scheduling time only. The "IgnoredDuringExecution" half means once the pod has been placed, label changes on the node DON'T cause the pod to be evicted — so if you re-label a node mid-experiment, your already-running pod stays. The hard match: if no node satisfies the affinity, the pod stays Pending forever (with `FailedScheduling` events). The mirror is `preferredDuringSchedulingIgnoredDuringExecution`, which is *soft* — the scheduler tries to match preferences but will fall back to ANY node if no match. The required version is what most NCA-AIIO content tests; the preferred version is the production-friendly "prefer this hardware but don't block on it." Note: there is no `RequiredDuringExecution` (it would mean live-evict on label change) — Kubernetes doesn't ship that.

Question 4

Why does my pod stay Pending with `node(s) had untolerated taint`?

Accepted Answer

The scheduler considered the node, but the node has a taint your pod doesn't tolerate. The fix: read the taint (`kubectl get node <name> -o jsonpath='{.spec.taints}'`), then add a matching toleration to your pod spec. Tolerations match on the same `key`+`value`+`effect` triple as the taint. Three operators: `Equal` (matches when key+value+effect all match), `Exists` (matches any value with that key+effect — useful for opt-in to ALL taints with a given key), and the special "no operator + no key" combo which tolerates ALL taints (use sparingly). Production GPU node pools commonly carry the taint `nvidia.com/gpu=present:NoSchedule` to repel non-GPU workloads; only pods with `tolerations: [{key: nvidia.com/gpu, operator: Exists, effect: NoSchedule}]` land there.

Multi-GPU-Type Targeting — nodeSelector, nodeAffinity & Tolerations

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

When should I use nodeSelector vs nodeAffinity?

What's the difference between taints and nodeSelector — they both block scheduling?

What does `requiredDuringSchedulingIgnoredDuringExecution` actually do?

Why does my pod stay Pending with `node(s) had untolerated taint`?

Multi-GPU-Type Targeting — nodeSelector, nodeAffinity & Tolerations

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

When should I use nodeSelector vs nodeAffinity?

What's the difference between taints and nodeSelector — they both block scheduling?

What does requiredDuringSchedulingIgnoredDuringExecution actually do?

Why does my pod stay Pending with node(s) had untolerated taint?

What does `requiredDuringSchedulingIgnoredDuringExecution` actually do?

Why does my pod stay Pending with `node(s) had untolerated taint`?