Learning path·Intermediate · 35–50 hours

Become an AI Infrastructure Engineer with real Kubernetes and real GPUs

Run GPU clusters that don't melt down. Kubernetes, GPU Operator, MIG, MPS, observability.

0 interactive lessons16 GPU labs
6
Modules
16
Hands-on GPU labs
2
Cert checkpoints

About this path

AI Infrastructure Engineers run the GPU clusters that everyone else's models depend on. The role lives at the intersection of Kubernetes platform engineering and GPU-specific operations, and almost no online course teaches it through real clusters. This path does. Every module is a lab on a live, isolated Kubernetes environment with real GPU scheduling, real GPU operator components, and real triage scenarios that break the way they break in production.

Skills you'll put on a resume

  • Schedule GPU workloads on Kubernetes with the right resource requests, limits, and priority classes
  • Install and triage the NVIDIA GPU Operator end-to-end, attributing every component's role
  • Share single GPUs across workloads with CUDA streams, MPS, and MIG, and know when each fits
  • Implement the four-stage cost-audit pipeline (measure → classify → price → recommend) for GPU fleets
  • Profile PyTorch training with Nsight Systems and the built-in profiler to find real bottlenecks
  • Pass the NVIDIA-Certified Associate AI Infrastructure & Operations exam

For

Platform engineers, SREs, and DevOps engineers responsible for running GPU workloads in production. Comfortable with Linux and Kubernetes basics; new to GPU-specific operations.

Prerequisites

  • Comfortable with Linux command line and basic shell scripting
  • Kubernetes basics (Pods, Deployments, Services), equivalent of CKA prep
  • Familiarity with containers (Docker / containerd)

Guides & articles

Deep-dive reading that pairs with this course

Ready to start?

Pro gives you all 16 labs in this path, every other lab on Preporato, and every practice test. $29.99/mo, cancel anytime.