Question 1

What's the difference between PV, PVC, and StorageClass?

Accepted Answer

A `PersistentVolume` (PV) is a piece of storage in the cluster — a disk, a network mount, a cloud volume — represented as an API object. A `PersistentVolumeClaim` (PVC) is a *request* for storage written by a pod author: "I need 5Gi, ReadWriteOnce, on this StorageClass." A `StorageClass` describes how to dynamically provision PVs that satisfy claims — its `provisioner` field names the CSI driver, and `parameters` are passed through to that driver. The flow: pod author writes PVC → cluster matches a PV (or asks the StorageClass's provisioner to create one) → PVC binds to PV → pod's `volumes.persistentVolumeClaim.claimName` mounts the PV inside the pod's filesystem. Most production clusters use dynamic provisioning (PVCs trigger PV creation); static PVs are rare and usually for legacy / shared infrastructure.

Question 2

What does ReadWriteOnce actually mean?

Accepted Answer

`ReadWriteOnce` (RWO) means the volume can be mounted read-write by **one node at a time** — *not one pod*. This is the field that bites engineers most often. On a single-node cluster (or when your scheduler co-locates pods), two pods can share an RWO volume just fine; the kernel sees a single mount and both pods write through the same filesystem. RWO blocks pods on *different* nodes from attaching the same volume simultaneously. If you actually need single-pod-only access, use `ReadWriteOncePod` (Kubernetes 1.22+), which the scheduler enforces by refusing to schedule a second pod that references the same PVC. For multi-node concurrent access, use `ReadWriteMany` (RWX), which requires a StorageClass backed by a network filesystem (NFS, CephFS, EFS, FSx, GlusterFS) — block-device storage classes (most cloud disks, local-path) cannot satisfy RWX.

Question 3

Why is my PVC stuck Pending?

Accepted Answer

Three common reasons and how to tell which one. (1) `volumeBindingMode: WaitForFirstConsumer` — the PVC won't bind until a pod references it. Look at `kubectl describe pvc` events for "WaitForFirstConsumer" — that's normal and resolves when the pod is created. (2) The StorageClass's provisioner can't satisfy the claim — wrong accessMode (asking RWX from a block-device class), oversized request, or the provisioner's CSI controller is down. Look at events for "ProvisioningFailed". (3) The named `storageClassName` doesn't exist in the cluster. `kubectl get storageclass` to confirm what's available; the default class is whichever has `is-default-class: "true"` in annotations. The PVC's events are the diagnostic — `kubectl describe pvc <name>` shows whether the cluster has tried to bind, why it failed, and whether it's still waiting.

Question 4

What happens to my data when I delete the PVC?

Accepted Answer

The StorageClass's `reclaimPolicy` decides: `Delete` (the default for most dynamic provisioners) tears down the underlying PV and the storage backend's actual data — your training checkpoints are gone. `Retain` leaves the PV in `Released` state with the data intact, but the PV won't be re-bindable to a new PVC without manual intervention (you must `kubectl edit` it to clear the `claimRef` field). For production, the safe default is to author StorageClasses with `reclaimPolicy: Retain` for any data you can't afford to lose, and have a backup mechanism on top (snapshots, `velero`, application-level checkpointing to object storage). For ephemeral caches and scratch space, `Delete` is correct — you want the disk to come back when the experiment ends.

Persistent Storage for AI Workloads — PVCs, StorageClass & the Checkpoint Pattern

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll learn

Frequently asked questions

What's the difference between PV, PVC, and StorageClass?

What does ReadWriteOnce actually mean?

Why is my PVC stuck Pending?

What happens to my data when I delete the PVC?