Question 1

Why do two same-seed runs drift in default PyTorch?

Accepted Answer

Because 'same seed' only controls the pseudo-random sources you hand it — the RNG for weight initialization, dropout masks, data shuffling. It doesn't control the GPU reduction order. Ops like `scatter_add` and `atomicAdd`-based kernels sum floats in whichever order the warps finish, and floating-point addition isn't associative, so the final value differs run to run. cuDNN also picks among multiple kernel implementations by autotuning, so the same matmul may use a different algorithm on run B.

Question 2

What does `CUBLAS_WORKSPACE_CONFIG` actually do?

Accepted Answer

It tells cuBLAS to allocate a fixed-size deterministic workspace buffer instead of whatever happens to be available, which forces it to pick deterministic kernels for GEMM. Set it to `:4096:8` (4 MB, 8 configurations) or `:16:8` depending on your memory budget. PyTorch's `torch.use_deterministic_algorithms(True)` will actually raise if you enable determinism without setting this env var, because without it cuBLAS can't honor the guarantee.

Question 3

How much does determinism usually cost in wall time?

Accepted Answer

Depends heavily on the workload. On a small model with simple ops it's often in the single-digit percent range or even slightly faster (because `cudnn.benchmark` was spending time autotuning and you just disabled that). On convnets with big cuDNN coverage it can be 10-30% slower. On ops where the deterministic path is fundamentally worse — scatter reductions, sparse ops, some interpolations — it can be multiples. The lab records `determinism_cost_pct` from your own measurement rather than guessing.

Question 4

Which PyTorch ops have non-deterministic default paths?

Accepted Answer

The reliable list includes `scatter_add`, `index_add`, `index_put` with `accumulate=True`, `bincount`, `grid_sample`, `interpolate` (some modes), `torch.Tensor.put_` with `accumulate=True`, `torch.nn.functional.ctc_loss`, and any op whose GPU implementation reduces with atomicAdd. `torch.use_deterministic_algorithms(True)` will either pick a slower deterministic path or raise on these — which one depends on the op and PyTorch version.

Question 5

Why isn't a content-addressable config hash enough on its own?

Accepted Answer

Because the hash only covers what you put in it. Things it can't catch: the CUDA driver version shipped in your container, the cuDNN release that updated a kernel yesterday, a silent dataset mirror redirect, a pip-resolved dependency whose version moved because your lockfile was loose, and hardware SKU swaps (an H100 TensorCore path isn't bit-identical to an A100's). The reflection step asks you to name this failure mode and argue for pinning container image digests, dataset checksums, and a full lockfile alongside the seed.

Question 6

Should I run with determinism on for all production training?

Accepted Answer

Usually no. Determinism is expensive, sometimes unavailable for ops you need, and unnecessary once you're at the scale where a single run takes days and you're going to log everything anyway. Turn it on for the debugging and CI paths: reproducing a specific gradient explosion, auditing a security-sensitive model, regression-testing a framework upgrade. Turn it off for production training and lean on pinned environments + detailed logging for post-hoc traceability.

Reproducible Training: The Flags, The Cost, The Artifacts

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this reproducibility lab

Frequently asked questions

Why do two same-seed runs drift in default PyTorch?

What does `CUBLAS_WORKSPACE_CONFIG` actually do?

How much does determinism usually cost in wall time?

Which PyTorch ops have non-deterministic default paths?

Why isn't a content-addressable config hash enough on its own?

Should I run with determinism on for all production training?

Reproducible Training: The Flags, The Cost, The Artifacts

What you'll learn

Prerequisites

Exam domains covered

Skills & technologies you'll practice

What you'll build in this reproducibility lab

Frequently asked questions

Why do two same-seed runs drift in default PyTorch?

What does CUBLAS_WORKSPACE_CONFIG actually do?

How much does determinism usually cost in wall time?

Which PyTorch ops have non-deterministic default paths?

Why isn't a content-addressable config hash enough on its own?

Should I run with determinism on for all production training?

What does `CUBLAS_WORKSPACE_CONFIG` actually do?