Labs
AI / ML labs
Beta

Hands-on labs for LLMs, RAG & agents. Real GPUs.

Fine-tune LLMs with LoRA, ship RAG pipelines on NVIDIA NIM, build agentic systems, and profile CUDA — on real GPU sandboxes and hosted environments. Auto-graded against live output. No simulators.

46 labs·33 on real GPUs · 13 hosted·9 NVIDIA certs covered·Auto-graded against real output

Start here

ncp-aai · react-agent-nim
★ Staff pick · Hosted

Build a ReAct Agent with NVIDIA NIM

Build a complete reasoning + acting agent from scratch using LangChain, LangGraph, and NeMo Agent Toolkit — the three pillars of the NCP-AAI exam.

# react-agent-nim · graded run (hosted)
POST /api/agent/invoke
tool_calls ....... 3
response ......... 200 OK
grade ............ pass
This week
Trending
+48%
more lab starts this week across Model

Model Context Protocol (MCP): Build a Tool Server

Build a Model Context Protocol server that exposes your company's tools and data — then connect a LangChain agent to it. Learn how MCP decouples tools from agents, when to use MCP vs Anthropic Skills vs native @tool, and why MCP is the emerging standard for AI tool interop.

Most completed
Popular
7 steps
Learners who finish this lab build confidence fast

Build a RAG Pipeline with NVIDIA NIM

Build a complete Retrieval Augmented Generation pipeline — from document chunking to vector search to an agent that answers questions from your knowledge base.

46 labs · grouped by topic

Agentic AI

LangGraph, NeMo Agent Toolkit, MCP, A2A, RAG agents
13 labs
AB
HOSTED Pro
ncp-aai · a2a-communicationAdvanced

Agent-to-Agent (A2A) Communication

Build two independent agents that talk to each other via the A2A protocol — each owned by a different team, running in its own process, discovered through a standardized AgentCard. Learn how A2A differs from multi-agent orchestration and when each architecture fits.

ncp-aai
40 minHosted
Launch
# Milvus + LangGraph
$ checkpoint.save
short_term ...... 12 msgs
long_term ....... 84 facts
recall@5 = 0.92
HOSTED Pro
ncp-aai · agent-memoryIntermediate

Agent Memory & Persistence

Build a sales intelligence assistant that remembers — short-term conversation state with LangGraph checkpointer, long-term facts in Milvus, and reflection loops that auto-extract knowledge. Learn the memory architecture every production agent needs.

ncp-aai
35 minHosted
Launch
ReActToolsPlan
HOSTED Pro
ncp-aai · agent-patternsIntermediate

Agent Patterns: ReAct vs Tool Calling vs Plan-and-Execute

Build the same SaaS customer support agent three different ways — ReAct, direct tool calling, and plan-and-execute — then compare them on speed, reasoning quality, and reliability to learn when to use each pattern in production.

ncp-aai
35 minHosted
Launch
clientMCPtool
HOSTED Pro
ncp-aai · mcp-tool-serversAdvanced

Model Context Protocol (MCP): Build a Tool Server

Build a Model Context Protocol server that exposes your company's tools and data — then connect a LangChain agent to it. Learn how MCP decouples tools from agents, when to use MCP vs Anthropic Skills vs native @tool, and why MCP is the emerging standard for AI tool interop.

ncp-aai
40 minHosted
Launch
SABC
HOSTED Pro
ncp-aai · multi-agent-orchestrationIntermediate

Multi-Agent Orchestration with LangGraph

Build a supervisor agent that routes queries to specialist agents — a core architecture pattern tested on the NCP-AAI exam.

ncp-aai
40 minHosted
Launch
chunkembedretrievenim →
HOSTED Pro
ncp-aai · rag-pipeline-nimIntermediate

Build a RAG Pipeline with NVIDIA NIM

Build a complete Retrieval Augmented Generation pipeline — from document chunking to vector search to an agent that answers questions from your knowledge base.

ncp-aainca-genl
35 minHosted
Launch
# ReAct · thought/action/observe
1. Thought:
2. search_docs(…)
3. Observation:
Final answer → 200 OK
HOSTED Pro
ncp-aai · react-agent-nimIntermediate

Build a ReAct Agent with NVIDIA NIM

Build a complete reasoning + acting agent from scratch using LangChain, LangGraph, and NeMo Agent Toolkit — the three pillars of the NCP-AAI exam.

ncp-aai
35 minHosted
Launch
# NeMo Guardrails
$ rails.check(query)
jailbreak ....... BLOCKED
off-topic ....... BLOCKED
IT query → ALLOWED
HOSTED Pro
ncp-aai · safety-guardrailsIntermediate

Safety & Guardrails for AI Agents

Build a guarded IT support agent that blocks jailbreaks, refuses off-topic questions, and safely handles IT queries — using keyword checks, LLM-based validation, and NeMo Guardrails.

ncp-aaincp-genl
35 minHosted
Launch
# agent-evaluation · agent
POST /api/agent/invoke
200 OK · graded
HOSTED Pro
ncp-aai · agent-evaluationIntermediate

Evaluate an Agent with LLM-as-Judge

Build an eval harness that scores agent responses automatically — correctness via a reference-based judge, plus an accuracy metric and A/B comparison. Same pattern used by NeMo Evaluator for production agent evaluation.

ncp-aai
30 minHosted
Launch
# model-routing-cascade · agent
POST /api/agent/invoke
200 OK · graded
HOSTED Pro
ncp-aai · model-routing-cascadeIntermediate

Model Routing & Cost Cascade with NIM

Save 60–80% on inference by cascading queries through cheap → mid → expensive NIM models. Measure real costs via NIM's usage.cost field and compare against an always-large baseline.

ncp-aai
25 minHosted
Launch
# structured-output-tools · agent
POST /api/agent/invoke
200 OK · graded
HOSTED Pro
ncp-aai · structured-output-toolsIntermediate

Structured Output & Function Calling with NIM

Get reliable machine-parseable data out of an LLM. Compare prompt-only JSON extraction against the function-calling API, chain two tools, and measure the reliability gap on a real extraction task.

ncp-aai
30 minHosted
Launch
# vlm-visual-qa · agent
POST /api/agent/invoke
200 OK · graded
HOSTED Pro
ncp-aai · vlm-visual-qaIntermediate

Visual Q&A with NVIDIA VLMs

Send images to a Vision-Language Model via NIM, answer questions about them, extract structured fields from a receipt-style image, and compare two VLMs on the same task — all through the OpenAI-compatible chat endpoint.

ncp-aainca-genm
30 minHosted
Launch
# multimodal-rag · agent
POST /api/agent/invoke
200 OK · graded
HOSTED Pro
ncp-aai · multimodal-ragIntermediate

Multimodal RAG with NeMo Retriever

Build an image-query RAG system: embed a catalog with NeMo Retriever, translate an uploaded image into a retrieval query via a VLM, and ground the VLM's final answer in the retrieved passages.

ncp-aainca-genm
35 minHosted
Launch

LLM serving & inference

vLLM, Triton, continuous batching, paged KV cache
5 labs
naivevLLM
GPU Pro
ncp-genl · deploy-serve-llmsIntermediate

Deploy & Serve LLMs in Production

Go from slow single-request inference to production-ready LLM serving with vLLM. Benchmark throughput, tune settings, and learn when to use vLLM vs Triton vs TGI.

ncp-genlnca-genl
45 min5 steps
Launch
# vLLM · 10 concurrent
$ asyncio.gather(…)
wall time ....... 3.8s
throughput ...... 1,842 tok/s
GPU Pro
ncp-genl · deploy-serve-llms-jupyterIntermediate

Deploy & Serve LLMs in Production (Jupyter)

Go from slow single-request inference to production-ready LLM serving with vLLM. Benchmark throughput, tune settings, and learn when to use vLLM vs Triton vs TGI.

ncp-genlnca-genl
45 min5 steps
Launch
GPU Pro
ncp-genl · inference-servingIntermediate

Inference Serving Patterns: Dynamic Batching, Throughput, and the Triton Mental Model

Build a mini-Triton inference server in ~30 lines of Python: a dynamic batcher with max_batch_size and max_queue_delay knobs, load-tested against a naive baseline, swept for the throughput-latency tradeoff, and bridged to a real Triton config.pbtxt.

ncp-genlncp-aionca-genl
40 min4 steps
Launch
fp32fp16bf16
GPU Pro
ncp-genl · precision-sweepIntermediate

Batch Size & Precision Sweep: Finding Your Sweet Spot

Sweep batch sizes and numerical precisions (fp32, fp16, bf16) on a real model to find the throughput/VRAM knee, then ship a production recommendation with SKU-aware precision picks and an accuracy gate.

ncp-genlnca-aiioncp-ads
40 min4 steps
Launch
2048512
GPU Pro
ncp-genl · vllm-servingAdvanced

vLLM Production Serving: PagedAttention, Continuous Batching, Prefix Caching

Stand up vLLM and measure the three features that make it the de-facto inference server: PagedAttention's KV-cache capacity, continuous batching throughput, and prefix caching speedups. Then write the production spec — server args, Kubernetes deployment, monitoring, autoscaling.

ncp-genlncp-aionca-genl
55 min4 steps
Launch

Fine-tuning & alignment

LoRA, QLoRA, DPO, preference data, PEFT
5 labs
GPU Pro
ncp-genl · fine-tune-llm-loraIntermediate

Fine-Tune an LLM with LoRA and QLoRA

Fine-tune Meta Llama 3 8B on a custom instruction dataset using LoRA and QLoRA. Learn parameter-efficient fine-tuning from data preparation through evaluation — the #1 most demanded AI skill.

ncp-genlncp-adsnca-genl
45 min7 steps
Launch
# lora · r=16 · alpha=32
$ trainer.train()
trainable ....... 0.8%
eval_loss ....... 1.24
perplexity ...... 3.45
GPU Pro
ncp-genl · fine-tune-llm-lora-jupyterIntermediate

Fine-Tune an LLM with LoRA and QLoRA (Jupyter)

Fine-tune Meta Llama 3 8B on a custom instruction dataset using LoRA and QLoRA. Learn parameter-efficient fine-tuning from data preparation through evaluation — the #1 most demanded AI skill.

ncp-genlncp-adsnca-genl
45 min7 steps
Launch
fp16int8nf4
GPU Pro
ncp-genl · quantizationIntermediate

Quantize & Optimize LLMs with bitsandbytes

Load a model in fp16, INT8, and NF4, then benchmark the three precisions on VRAM, latency, and output quality. See where quantization wins and where it costs you.

ncp-genlncp-adsnca-genl
40 min4 steps
Launch
GPU Pro
ncp-genl · dpo-alignmentAdvanced

RLHF & DPO Alignment

Run real Direct Preference Optimization on a small language model with TRL's DPOTrainer. Capture a baseline, build a preference dataset, train, and measurably shift the model's behavior in four steps.

ncp-genlncp-adsnca-genl
55 min4 steps
Launch
GPU Pro
nca-genm · sd-loraIntermediate

Stable Diffusion + LoRA

Load Stable Diffusion, attach LoRA adapters to the U-Net's attention layers, run a tiny overfit training loop, and generate with the adapted weights to prove that a few million trainable parameters actually move pixels.

nca-genmncp-genl
45 min4 steps
Launch

RAG & retrieval

Embeddings, vector search, rerankers, RAG agents
2 labs
GPU Pro
ncp-genl · advanced-ragAdvanced

Advanced RAG: Hybrid Search + Cross-Encoder Reranking

Build a production-shape retrieval stack — dense bi-encoder plus from-scratch BM25, fused with Reciprocal Rank Fusion, then re-ordered by a BAAI cross-encoder. The exact architecture behind modern enterprise RAG.

ncp-genlnca-genl
40 min4 steps
Launch
GPU Pro
ncp-genl · rag-pipelineIntermediate

Retrieval-Augmented Generation (RAG) Pipeline with Local Models

Build an end-to-end RAG pipeline on a single GPU: BGE embeddings, L2-normalized vector retrieval by dot product, and a local generator that answers with and without retrieved context so you can see exactly what retrieval changes.

ncp-genlnca-genl
45 min4 steps
Launch

Training & pretraining

Transformer internals, pretraining, custom models
3 labs
GPU Pro
ncp-genl · continued-pretrainingAdvanced

Continued Pre-Training: Adapt a Pretrained LM to a New Domain

Take GPT-2 and domain-adapt it to Python code in 150 steps, measuring both the gain on code and the cost in catastrophic forgetting on English. The exact recipe behind Code Llama, BloombergGPT, and every domain-specialized LLM of the last three years.

ncp-genlncp-adsnca-genl
45 min4 steps
Launch
GPU Pro
ncp-genl · train-slmAdvanced

Train a Small Language Model from Scratch

Train a real GPT-style language model from zero on TinyStories: tokenize, wire up the optimizer and LR schedule, run the training loop with validation perplexity, and generate coherent text from your own weights. End-to-end pretraining in minutes on one GPU.

ncp-genlncp-adsnca-genl
55 min4 steps
Launch
GPU Pro
ncp-genl · transformer-from-scratchAdvanced

Transformer Architecture Deep Dive

Build every piece of a decoder-only transformer by hand — scaled dot-product attention, multi-head attention, the full block with residuals and LayerNorm, then assemble a tiny GPT and train it. No shortcuts, no pre-built attention modules.

ncp-genlnca-genlncp-ads
50 min4 steps
Launch

CUDA & kernel optimization

Kernels, shared memory, autograd ops
3 labs
naiveaddtiledautograd
GPU Pro
ncp-genl · cuda-fundamentalsAdvanced

CUDA Programming Fundamentals

Write four real CUDA C++ kernels and run them from PyTorch: vector add, 2D matrix add, tiled matmul with shared memory, and a custom autograd op.

ncp-genlnca-aiioncp-ads
45 min4 steps
Launch
GPUABC
GPU Pro
ncp-aio · gpu-sharingAdvanced

GPU Sharing: Streams, MPS, MIG, and the Real Cost of Contention

Measure four ways to share a single GPU — CUDA streams, multi-process time-slicing, MPS, and MIG — and write the production artifacts (start scripts, k8s device-plugin ConfigMaps, MIG geometries) that turn 15%-utilized fleets into 80%-utilized ones.

ncp-aionca-aiioncp-ads
45 min4 steps
Launch
GPU Pro
nca-aiio · nsight-profilingIntermediate

Nsight Systems Profiling: Finding the Bottleneck That Costs You 40% of Your GPU

Run the full profile-then-fix loop with NVIDIA Nsight Systems — instrument a training loop with NVTX ranges, capture a .nsys-rep, parse the NVTX summary to pinpoint the bottleneck, then apply a targeted fix and measure the speedup.

nca-aiioncp-aioncp-genl
35 min4 steps
Launch

Profiling & performance

Nsight, PyTorch profiler, precision, quantization
2 labs
# torch.profiler · top ops
$ table(sort=cuda_time)
gemm ........ 41%
memcpy ...... 22%
softmax ..... 9%
GPU Pro
nca-aiio · pytorch-profilerIntermediate

Profile PyTorch Training with the Built-in Profiler

Instrument a training loop with torch.profiler, read the op-level table, inspect the Chrome/Perfetto timeline, and decide when to reach for Nsight Systems instead.

nca-aiioncp-genlncp-ads
35 min4 steps
Launch
$$$
GPU Pro
ncp-aio · cost-auditIntermediate

GPU Cost & Efficiency Audit

Build a four-stage cost-audit pipeline — measure, classify, price, recommend — that turns raw NVML samples into dollar-denominated waste and specific remediation actions. The skeleton behind every enterprise GPU cost product.

ncp-aionca-aiionca-genl
35 min4 steps
Launch

Multimodal

Vision-language, Stable Diffusion, VLM fine-tuning
1 lab
GPU Pro
nca-genm · vlmIntermediate

Vision-Language Models: Captioning and Visual QA

Load Qwen2-VL, caption a real image, run a battery of visual question-answering prompts, and dissect the architecture — vision encoder, projector, language model — to see exactly how pixels become tokens the LLM can reason over.

nca-genm
35 min4 steps
Launch

Data & pipelines

DALI, tokenizers, dataset curation, synthetic data
3 labs
CPUDALI
GPU Pro
ncp-genl · dali-pipelineIntermediate

NVIDIA DALI: GPU-Accelerated Data Pipelines

Move image decoding, resizing, and augmentation from CPU to GPU with NVIDIA DALI, and benchmark it against a standard PyTorch DataLoader. The input-pipeline fix that unlocks real multi-GPU throughput.

ncp-genlncp-adsnca-genl
30 min4 steps
Launch
# dedup + tokenize
$ filter_lang("en")
raw ............ 52k → 18k
tokenizer ...... BPE 32k
train_ready .... 16,420
GPU Pro
ncp-genl · data-prepIntermediate

Data Preparation for LLM Training

Build a real pretraining/instruction data pipeline: load a raw corpus, apply quality filters, deduplicate, train a BPE tokenizer, and batch-validate on GPU. This is the unglamorous work that actually decides how good your model will be.

ncp-genlnca-genlncp-ads
45 min5 steps
Launch
seedgenfilterdedup
GPU Pro
ncp-genl · synthetic-dataIntermediate

Synthetic Data Generation for Model Training

Build a Self-Instruct style synthetic dataset end-to-end: seed instructions, LLM-driven generation, robust parsing, quality filtering, and dedup + diversity scoring. The same pipeline that produced Alpaca, WizardLM, and most modern instruction-tuning corpora.

ncp-genlncp-adsnca-genl
40 min4 steps
Launch

GPU infrastructure

Kubernetes, MIG/MPS, containers, device plugins
4 labs
buildtestshiproll
GPU Pro
ncp-aio · container-lifecycleIntermediate

GPU Container Lifecycle: Build, Test, Ship, Rollback

Walk through the full lifecycle of a production GPU container — multi-stage Dockerfile, self-hosted GPU CI, a fail-fast smoke test, and a Kubernetes Deployment with readiness probes gated on real GPU compute. The pipeline that stops bad images before users see a 500.

ncp-aionca-aiionca-genl
40 min4 steps
Launch
# NVML · health probe
$ pynvml.nvmlDeviceGetPids
rogue_pid ....... 2341
VRAM leak: 14.7 GB
remediation: SIGKILL → clean
GPU Pro
nca-aiio · gpu-healthAdvanced

GPU Health Checks + Auto-Remediation

Build a production-grade GPU watchdog: multi-dimensional NVML health probe, rogue-process detection, auto-remediation that kills the offender and verifies recovery, then wire it up with Prometheus alerts and Kubernetes liveness probes.

nca-aiioncp-aioncp-aii
50 min4 steps
Launch
run1run2run3run4
GPU Pro
ncp-aio · mlflow-trackingIntermediate

MLflow Experiment Tracking: From Single Run to Team Workflow

Wire the four load-bearing pieces of MLflow into a real training loop — tracked runs with params and metrics, a registered model with stage transitions, a multi-run sweep + search, and a production spec (server, k8s Job, tags, autolog).

ncp-aionca-genlncp-ads
35 min4 steps
Launch
k3sop
GPU Pro
ncp-aio · k3s-gpu-operatorIntermediate

NVIDIA GPU Operator on k3s: Single-Node Kubernetes for GPU Workloads

Bring up a lightweight single-node Kubernetes cluster with the NVIDIA GPU Operator — k3s install, containerd wiring, Helm values, workload manifests with RBAC and ResourceQuota, plus a full runbook (validation plan, troubleshooting matrix, day-2 ops).

ncp-aioncp-ainncp-aii
40 min4 steps
Launch

Monitoring & ops

MLflow, Prometheus, health probes, cost audit
1 lab
util
86%
mem
15GB
temp
74°C
power
310W
GPU Pro
nca-aiio · gpu-monitoringIntermediate

GPU Observability: From nvidia-smi to a Production Monitoring Stack

Go from a raw NVML snapshot to a real monitoring pipeline: capture live GPU telemetry during a workload, diagnose a dataloader bottleneck from the utilization trace, and expose everything as a Prometheus /metrics endpoint.

nca-aiioncp-aionca-genl
40 min4 steps
Launch

Accelerated data science

RAPIDS, cuDF, cuML — GPU-accelerated DS
1 lab
pandascuDF
GPU Pro
ncp-ads · rapids-gpu-dsIntermediate

GPU-Accelerated Data Science with RAPIDS

Rewrite a pandas + sklearn data-science pipeline on GPU using cuDF and cuML, benchmark each stage against the CPU baseline, and run an end-to-end filter -> feature-engineer -> predict pipeline that never leaves the GPU.

ncp-adsnca-genl
40 min4 steps
Launch

More labs

Utilities, smoke tests, foundations
3 labs
GPU Pro
ncp-genl · evaluationIntermediate

Evaluation & Benchmarking LLMs

Four evaluation lenses in one lab: compute real perplexity, expose BLEU's blindness to paraphrase, run side-by-side model comparisons, and build an LLM-as-judge harness with position-bias detection.

ncp-genlnca-genlncp-ads
45 min4 steps
Launch
# determinism flags
$ torch.use_deterministic_…
seed ............ 42
cudnn.benchmark . False
reproducible .... TRUE
GPU Pro
ncp-genl · reproducible-trainingIntermediate

Reproducible Training: The Flags, The Cost, The Artifacts

Measure the non-determinism noise floor in default PyTorch, flip every determinism flag until same-seed runs match bit-for-bit, quantify the perf cost, and capture a content-addressable training config that makes a run reproducible forever.

ncp-genlncp-aioncp-ads
40 min4 steps
Launch
# nvidia-smi · health
$ nvidia-smi -L
GPU 0 ........... RTX 4090
VRAM ............ 24 GB
CUDA 12.4 · driver ok
GPU Pro
labBeginner

GPU Environment Smoke Test

Validate the GPU lab environment: terminal, file operations, PyTorch, CUDA, and model loading.

10 min4 steps
Launch

Every lab runs on real AI infrastructure.

No video simulators, no canned outputs. Spin up a real GPU, or hook into our hosted stack — either way, you're graded on the metrics you actually produce.