100% FREENCP-AII

Free NVIDIA-Certified Professional: AI Infrastructure (NCP-AII) Practice Questions

Test your knowledge with 20 free exam-style questions

NCP-AII Exam Facts

Questions

Passing

720/1000

Duration

130 min

Question 1Free Practice

A data center engineer is deploying a new DGX H100 cluster and needs to validate multi-GPU communication performance across 4 nodes with 32 GPUs total. The engineer must verify that the NVLink and InfiniBand fabrics are functioning correctly before running production workloads. Which command correctly executes an NCCL all-reduce test across the cluster?

1 / 20

...

Want More Practice?

These 20 questions are just a sample. Get access to 7 full exams with 455+ questions.

Frequently Asked Questions

These 20 sample questions let you experience the exact format, difficulty, and question styles you'll encounter on exam day. Use them to identify knowledge gaps and decide if our full practice exam package is right for your preparation strategy.

Our questions mirror the actual exam format, difficulty level, and topic distribution. Each question includes detailed explanations to help you understand the concepts.

The full package includes 7 complete practice exams with 455+ unique questions, detailed explanations, progress tracking, and lifetime access.

Yes! Our NCP-AII practice questions are regularly updated to reflect the latest exam objectives and question formats. All questions align with the current 2026 exam blueprint.

Sample NCP-AII Practice Questions

Browse all 20 free NVIDIA-Certified Professional: AI Infrastructure practice questions below.

A data center engineer is deploying a new DGX H100 cluster and needs to validate multi-GPU communication performance across 4 nodes with 32 GPUs total. The engineer must verify that the NVLink and InfiniBand fabrics are functioning correctly before running production workloads. Which command correctly executes an NCCL all-reduce test across the cluster?
- nccl-tests/build/all_reduce_perf --nodes 4 --gpus-per-node 8 --minbytes 8 --maxbytes 256M --check 1
- mpirun -np 4 --hostfile hosts.txt nccl-tests/build/all_reduce_perf -b 8 -e 256M -f 2 -g 8
- dcgmi diag -r 4 --host-list node01,node02,node03,node04 --parameters diagnostic.test_names=nccl
- nvidia-smi nccl --test all_reduce --hostfile hosts.txt --nodes 4 --gpus-per-node 8
During the bring-up of a DGX H100 system, an engineer needs to ensure consistent benchmark results when running HPL (High Performance Linpack) tests. The GPUs are currently operating with dynamic frequency scaling based on thermal and power conditions. Which nvidia-smi commands should be executed BEFORE running HPL to ensure consistent performance results? (Select TWO)
- sudo nvidia-smi -pm 1
- sudo nvidia-smi --gpu-reset -i 0,1,2,3,4,5,6,7
- sudo nvidia-smi -e 0
- sudo nvidia-smi -lgc 1980 --mode 1
An infrastructure team is deploying a DGX BasePOD with 8 DGX H100 nodes. They need to install the head node with Base Command Manager (BCM) to provision and manage the compute nodes. What is the correct sequence of steps for BCM head node installation?
- Install a standard Linux OS first → Download the BCM packages → Run the bcm-install script → Configure each compute node individually → Set up the management network after cluster configuration
- Boot from BCM ISO → Configure network interfaces → Set up management network → Install BCM software → Configure cluster settings → Create default node image
- Run bcm-pod-setup on all nodes → Install the GPU drivers → Configure NVLink and NVSwitch → Start the BCM services → Promote one compute node to head node
- Configure an external DHCP server → Provision the compute nodes first → Add the head node last → Import the existing node configurations → Run bcm-netautogen to generate the fabric layout
A systems administrator notices that DCGM health checks are reporting warnings for GPU 3 on a DGX H100 system. The administrator needs to investigate the issue further. Which dcgmi command provides comprehensive diagnostic information including GPU memory, PCIe bandwidth, and stress testing?
- dcgmi stats -g 3 --verbose
- nvidia-smi --query-gpu=health --format=csv -i 3
- dcgmi health -g 3 --check all
- dcgmi diag -r 3
During the initial setup of a DGX H100 system, an engineer needs to configure the Baseboard Management Controller (BMC) for out-of-band management. The engineer must ensure remote access and monitoring capabilities are properly configured. Which tasks should be completed during BMC configuration? (Select THREE)
- Configure the NVLink topology and NVSwitch GPU interconnect routing settings
- Install the NVIDIA GPU driver and CUDA toolkit packages through the BMC interface
- Set up IPMI user accounts with appropriate privilege levels
- Configure a static IP address on the dedicated BMC management port
- Enable Redfish API access for programmatic infrastructure management
A data center team is evaluating an upgrade from DGX H100 to DGX H200 systems for their large language model training workloads. They need to understand the key hardware differences. What are the primary memory improvements in the H200 GPU compared to the H100?
- H200 keeps the H100's 80GB HBM3 memory configuration and instead adds dedicated inference accelerator engines for transformer decoding
- H200 moves to 141GB of HBM3e memory at 4.8 TB/s of bandwidth, up from the H100's 80GB of HBM3 at 3.35 TB/s
- H200 doubles the count of Hopper streaming multiprocessors and Tensor Cores over H100 while keeping the same 80GB HBM3 memory subsystem
- H200 trims capacity to 64GB while pushing HBM3e bandwidth to 6.0 TB/s, trading memory size for speed in inference-focused deployments
A systems administrator is configuring driver persistence on a DGX H100 system to reduce GPU initialization latency for containerized inference workloads that start and stop frequently. What is the correct method to enable driver persistence, and what benefit does it provide?
- Install nvidia-persistenced under a systemd timer that starts it only when GPU jobs are expected, keeping the daemon's footprint off the host between workloads
- Enable persistence mode with nvidia-smi -pm 1 at every boot, which allocates dedicated system memory for GPU state caching so contexts survive across container restarts
- Run nvidia-persistenced as a system daemon: it keeps the kernel driver loaded by holding device file handles open, eliminating initialization latency between workloads
- Export CUDA_PERSISTENT_MODE=1 in /etc/environment so the CUDA runtime performs application-level driver caching for each container that starts
An engineer is analyzing the NVSwitch architecture in a DGX H100 system to understand GPU-to-GPU communication patterns. The system has 8 H100 GPUs and uses third-generation NVSwitch. How does NVSwitch enable full-mesh GPU connectivity in the DGX H100?
- Eight NVSwitch chips are provisioned, one dedicated to each GPU, with inter-switch trunk links carrying any traffic destined for the other GPUs' switches
- A single 64-port NVSwitch chip terminates every NVLink lane in the chassis, using port aggregation groups to deliver the full per-GPU bandwidth
- GPUs connect point-to-point over direct NVLink bridge cables, and the NVSwitch fabric carries only overflow traffic once the direct links saturate
- Four NVSwitch chips form parallel switch planes, and every GPU connects to all four switches, so any GPU pair communicates at full NVLink bandwidth over the aggregated paths
A cluster administrator needs to create a DCGM group to monitor a subset of GPUs allocated to a specific tenant on a multi-tenant DGX system and enable health watches for that group. Which dcgmi commands correctly create a group with GPUs 0-3 and enable health monitoring?
- dcgmi config --create-group tenant1 --gpus 0-3 --enable-health, which registers the group and its health watches in one call
- dcgmi health -c tenant1 -a 0,1,2,3 -w all, since the health subcommand can create the group and attach watches in a single step
- dcgmi group -c "tenant1" to create the group, dcgmi group -g <group_id> -a 0,1,2,3 to add the GPUs, then dcgmi health -g <group_id> -s a
- nvidia-smi --create-group=tenant1 --monitor-health=all, because group-level health monitoring is handled by the NVIDIA driver rather than the DCGM daemon
A systems engineer is deploying containerized GPU workloads on a DGX cluster using Slurm with Pyxis and Enroot. Users need to run NGC containers without requiring elevated privileges. How does the Pyxis and Enroot combination enable unprivileged container execution in Slurm?
- Pyxis grants users temporary sudo during container launch through a Slurm prolog hook, then revokes the elevated privileges once the runtime is up
- Pyxis runs Docker with the --privileged flag inside each allocation, while Enroot handles the GPU device passthrough and isolation between jobs
- Enroot unpacks container images into unprivileged chroot-style runtime sandboxes, and Pyxis exposes them to Slurm through the --container-image flag on srun and sbatch
- Enroot pre-converts NGC images into self-contained static binaries that execute without any container runtime at all, with Pyxis managing the conversion cache across successive jobs
A data center architect is designing a DGX SuperPOD deployment using BlueField DPUs for network acceleration. The DPUs should offload network processing from the host CPUs. What functions can BlueField DPUs offload in a DGX cluster?
- Network packet processing, RDMA, storage virtualization, and security functions such as encryption, freeing host CPUs for AI work
- CUDA kernel execution for network-related AI inference, running models directly on the DPU instead of the host GPUs
- GPU computation offload to the DPU's onboard Arm cores and accelerators when the host H100 GPUs are saturated with training workloads
- NVLink and NVSwitch traffic routing between GPUs within the DGX node, reducing load on the internal GPU fabric
An engineer needs to authenticate to the NGC (NVIDIA GPU Cloud) container registry to pull optimized deep learning containers for DGX nodes. What is the correct authentication method for NGC container registry access?
- Use 'docker login nvcr.io' with username '$oauthtoken' and password set to your NGC API key generated from the NGC portal
- Export the NGC_API_KEY environment variable; Docker picks it up automatically when pulling from nvcr.io
- No authentication is needed: NGC containers are public and Docker pulls them anonymously from nvcr.io
- Use your NVIDIA Developer account email and password directly with docker login against nvcr.io
A systems administrator is configuring Slurm MPI integration for multi-node GPU training on a DGX cluster. The cluster uses PMIx for process management. What Slurm configuration enables PMIx-based MPI job launch?
- Install the slurm-pmix plugin package, which runs a standalone PMIx server daemon alongside slurmd on every compute node
- Use mpirun with a hostfile generated from SLURM_JOB_NODELIST, letting Open MPI's ORTE daemons manage process placement across nodes
- Skip PMIx: it is only required for non-NVIDIA MPI stacks, and HPC-X on DGX systems bootstraps through its own launcher
- Set MpiDefault=pmix in slurm.conf with PMIx libraries installed; srun then handles PMIx bootstrapping for process launch
An engineer is troubleshooting slow checkpoint saving during training on a DGX H100 cluster. Checkpoints are saved to a parallel filesystem over the network. What should be investigated to improve checkpoint write performance? (Select TWO)
- Increase the GPU memory allocation so checkpoints are staged in HBM3 and written back asynchronously after the training step completes
- Enable GPUDirect Storage (GDS) on the checkpoint path, since DMA from GPU memory to storage always outperforms the traditional POSIX I/O path
- Check that filesystem stripe settings match checkpoint sizes: higher stripe counts improve large sequential write throughput
- Check if checkpoints are causing network congestion on the InfiniBand fabric during training communication phases
A cluster administrator is configuring the NGC CLI tool on DGX nodes for managing containers and models. The configuration should support team-based access to organization resources. How is the NGC CLI configured for organization access?
- Edit /etc/ngc/config as root to define system-wide organization and team defaults for all users on the node
- Export NGC_ORG and NGC_TEAM environment variables in each user's shell profile so every CLI invocation inherits the organization context
- Run 'ngc config set' and enter the API key, org, and optional team; settings persist in ~/.ngc/config
- No client configuration is needed: NGC resolves organization access from the registered IP range of the DGX system
An administrator needs to partition an NVIDIA H100 80GB GPU to provide isolated GPU instances for multiple users, each requiring approximately 10GB of memory. Which MIG profile should be created to maximize the number of instances per GPU?
- 3g.40gb
- 1g.10gb
- 2g.20gb
- 7g.80gb
Which command correctly lists all available MIG profiles supported by an NVIDIA GPU?
- dcgmi mig --show-profiles
- nvidia-smi mig -lgip
- nvmig list profiles
- nvidia-smi --list-mig-profiles
A DGX cluster administrator needs to configure BlueField-3 DPUs to operate in a mode where the Arm processor controls all NIC resources and the host system has restricted access. Which mode should be configured?
- NIC mode with INTERNAL_CPU_OFFLOAD_ENGINE disabled
- DPU mode with INTERNAL_CPU_OFFLOAD_ENGINE enabled
- Pass-through mode with ECPF disabled
- Hybrid mode with shared ECPF ownership
After creating GPU instances using 'nvidia-smi mig -cgi', what additional step is required before workloads can be scheduled to the MIG instances?
- Enable MIG mode at the driver level using nvidia-smi -mig 1
- Restart the nvidia-persistenced daemon
- Create Compute Instances (CI) within each GPU Instance
- Register the instances with Fabric Manager
Which BlueField-3 DPU mode provides the highest security by preventing the host system administrator from accessing the DPU, requiring all management through Arm cores or BMC connection?
- Isolated BMC mode with dedicated OOB management
- Standard DPU mode
- NIC mode with UEFI secure boot enabled
- Zero-trust mode (Restricted Mode)

NCP-AII Exam Facts

Questions

Passing Score

720/1000

Duration

130 min

View full exam details

Questions

Answered0 / 20