Free NVIDIA-Certified Professional: AI Infrastructure (NCP-AII) Practice Questions
Test your knowledge with 20 free exam-style questions
NCP-AII Exam Facts
Questions
65
Passing
720/1000
Duration
130 min
A data center engineer is deploying a new DGX H100 cluster and needs to validate multi-GPU communication performance across 4 nodes with 32 GPUs total. The engineer must verify that the NVLink and InfiniBand fabrics are functioning correctly before running production workloads. Which command correctly executes an NCCL all-reduce test across the cluster?
Frequently Asked Questions
These 20 sample questions let you experience the exact format, difficulty, and question styles you'll encounter on exam day. Use them to identify knowledge gaps and decide if our full practice exam package is right for your preparation strategy.
Our questions mirror the actual exam format, difficulty level, and topic distribution. Each question includes detailed explanations to help you understand the concepts.
The full package includes 7 complete practice exams with 455+ unique questions, detailed explanations, progress tracking, and lifetime access.
Yes! Our NCP-AII practice questions are regularly updated to reflect the latest exam objectives and question formats. All questions align with the current 2026 exam blueprint.
Sample NCP-AII Practice Questions
Browse all 20 free NVIDIA-Certified Professional: AI Infrastructure practice questions below.
A data center engineer is deploying a new DGX H100 cluster and needs to validate multi-GPU communication performance across 4 nodes with 32 GPUs total. The engineer must verify that the NVLink and InfiniBand fabrics are functioning correctly before running production workloads. Which command correctly executes an NCCL all-reduce test across the cluster?
- mpirun -np 4 --hostfile hosts.txt nccl-tests/build/all_reduce_perf -b 8 -e 256M -f 2 -g 8
- nccl-tests --nodes 4 --gpus-per-node 8 --test all_reduce --size 256M
- dcgmi diag -r 4 --group all_gpus --test nccl
- nvidia-smi nccl --test all_reduce --nodes 4 --gpus 32
During the bring-up of a DGX H100 system, an engineer needs to ensure consistent benchmark results when running HPL (High Performance Linpack) tests. The GPUs are currently operating with dynamic frequency scaling based on thermal and power conditions. Which nvidia-smi commands should be executed BEFORE running HPL to ensure consistent performance results? (Select TWO)
- sudo nvidia-smi -pm 1
- sudo nvidia-smi -lgc 1980 --mode 1
- sudo nvidia-smi --gpu-reset
- sudo nvidia-smi -e 0
An infrastructure team is deploying a DGX BasePOD with 8 DGX H100 nodes. They need to install the head node with Base Command Manager (BCM) to provision and manage the compute nodes. What is the correct sequence of steps for BCM head node installation?
- Boot from BCM ISO → Configure network interfaces → Set up management network → Install BCM software → Configure cluster settings → Create default node image
- Install Linux OS first → Download BCM packages → Run bcm-install script → Configure nodes individually → Set up network after cluster configuration
- Configure DHCP server → Install compute nodes first → Add head node last → Import existing node configurations → Run bcm-netautogen
- Run bcm-pod-setup on all nodes → Install drivers → Configure NVLink → Start BCM services → Create head node
A systems administrator notices that DCGM health checks are reporting warnings for GPU 3 on a DGX H100 system. The administrator needs to investigate the issue further. Which dcgmi command provides comprehensive diagnostic information including GPU memory, PCIe bandwidth, and stress testing?
- dcgmi diag -r 3
- dcgmi stats -g 3 --verbose
- dcgmi health -g 3 --check all
- nvidia-smi --query-gpu=health --format=csv -i 3
During the initial setup of a DGX H100 system, an engineer needs to configure the Baseboard Management Controller (BMC) for out-of-band management. The engineer must ensure remote access and monitoring capabilities are properly configured. Which tasks should be completed during BMC configuration? (Select THREE)
- Configure a static IP address on the dedicated BMC management port
- Set up IPMI user accounts with appropriate privilege levels
- Enable Redfish API access for programmatic infrastructure management
- Install NVIDIA GPU drivers through the BMC interface
- Configure NVLink topology and GPU interconnect settings
A data center team is evaluating an upgrade from DGX H100 to DGX H200 systems for their large language model training workloads. They need to understand the key hardware differences. What are the primary memory improvements in the H200 GPU compared to the H100?
- H200 features 141GB HBM3e memory with 4.8 TB/s bandwidth, compared to H100's 80GB HBM3 with 3.35 TB/s bandwidth - a 76% capacity increase and 43% bandwidth improvement
- H200 doubles the GPU compute cores from H100 while maintaining the same 80GB memory capacity
- H200 uses the same memory as H100 but adds dedicated inference accelerators
- H200 reduces memory to 64GB but increases bandwidth to 6.0 TB/s for inference optimization
A systems administrator is configuring driver persistence on a DGX H100 system to reduce GPU initialization latency for containerized inference workloads that start and stop frequently. What is the correct method to enable driver persistence, and what benefit does it provide?
- Run nvidia-persistenced as a system daemon. This keeps the NVIDIA kernel driver loaded by maintaining open device file handles, preventing driver teardown between GPU workloads and eliminating initialization latency.
- Enable persistence mode with nvidia-smi -pm 1, which allocates dedicated system memory for GPU state caching
- Configure CUDA_PERSISTENT_MODE=1 in the environment, which enables application-level driver caching
- Install nvidia-persistenced but only run it when GPU applications are expected, to minimize resource usage
An engineer is analyzing the NVSwitch architecture in a DGX H100 system to understand GPU-to-GPU communication patterns. The system has 8 H100 GPUs and uses third-generation NVSwitch. How does NVSwitch enable full-mesh GPU connectivity in the DGX H100?
- Four NVSwitch chips create switch planes where each GPU connects to all four switches, enabling any GPU to communicate with any other GPU at full NVLink bandwidth through aggregated switch paths
- Eight NVSwitch chips are used, one dedicated to each GPU, with cross-connections between switches for inter-GPU traffic
- GPUs are directly connected to each other via NVLink cables, with NVSwitch only handling overflow traffic
- A single NVSwitch chip with 64 ports connects all GPUs, with port aggregation for higher bandwidth
A cluster administrator needs to create a DCGM group to monitor a subset of GPUs allocated to a specific tenant on a multi-tenant DGX system and enable health watches for that group. Which dcgmi commands correctly create a group with GPUs 0-3 and enable health monitoring?
- dcgmi group -c "tenant1" to create the group, then dcgmi group -g <group_id> -a 0,1,2,3 to add GPUs, then dcgmi health -g <group_id> -s a to enable all health watches
- dcgmi config --create-group tenant1 --gpus 0-3 --enable-health
- dcgmi health -c tenant1 -a 0,1,2,3 -w all
- nvidia-smi --create-group=tenant1 --monitor-health
A systems engineer is deploying containerized GPU workloads on a DGX cluster using Slurm with Pyxis and Enroot. Users need to run NGC containers without requiring elevated privileges. How does the Pyxis and Enroot combination enable unprivileged container execution in Slurm?
- Enroot uses Linux chroot to create isolated runtime environments from container images without requiring root privileges, while Pyxis integrates this with Slurm via --container-image flag for srun/sbatch
- Pyxis runs Docker inside each job with --privileged flag, while Enroot handles GPU device isolation
- Pyxis provides sudo access to users temporarily during container launch, then revokes it after startup
- Enroot pre-converts all NGC images to static binaries that run without containerization, with Pyxis managing the conversion cache
A data center architect is designing a DGX SuperPOD deployment using BlueField DPUs for network acceleration. The DPUs should offload network processing from the host CPUs. What functions can BlueField DPUs offload in a DGX cluster?
- Network packet processing, RDMA operations, storage virtualization, and security functions like encryption - freeing host CPUs for AI workloads
- GPU computation offload when host GPUs are busy with training workloads
- NVLink traffic routing between GPUs within the DGX node
- CUDA kernel execution for network-related AI inference tasks
An engineer needs to authenticate to the NGC (NVIDIA GPU Cloud) container registry to pull optimized deep learning containers for DGX nodes. What is the correct authentication method for NGC container registry access?
- Use 'docker login nvcr.io' with username '$oauthtoken' and password set to your NGC API key generated from the NGC portal
- Use your NVIDIA Developer account email and password directly with docker login
- NGC containers are public and require no authentication for pulling
- Configure NGC_API_KEY environment variable and Docker automatically uses it
A systems administrator is configuring Slurm MPI integration for multi-node GPU training on a DGX cluster. The cluster uses PMIx for process management. What Slurm configuration enables PMIx-based MPI job launch?
- Configure MpiDefault=pmix in slurm.conf and ensure PMIx libraries are installed. Jobs use srun for process launch which handles PMIx bootstrapping automatically
- Use mpirun with hostfile generated from Slurm environment variables
- Install slurm-pmix plugin which runs separately from Slurm
- PMIx is only needed for non-NVIDIA MPI implementations
An engineer is troubleshooting slow checkpoint saving during training on a DGX H100 cluster. Checkpoints are saved to a parallel filesystem over the network. What should be investigated to improve checkpoint write performance? (Select TWO)
- Verify parallel filesystem stripe settings match the checkpoint file sizes - larger stripe counts and sizes improve throughput for large sequential writes
- Check if checkpoints are causing network congestion on the InfiniBand fabric during training communication phases
- Increase GPU memory allocation to cache checkpoints in GPU memory before writing
- Enable GPU Direct Storage (GDS) which is always faster than traditional I/O
A cluster administrator is configuring the NGC CLI tool on DGX nodes for managing containers and models. The configuration should support team-based access to organization resources. How is the NGC CLI configured for organization access?
- Run 'ngc config set' and provide the API key, organization name, and optionally team name. The configuration is stored in ~/.ngc/config for subsequent commands
- Export NGC_ORG and NGC_TEAM environment variables for each session
- Modify /etc/ngc/config as root to set system-wide organization defaults
- Organization access is determined by IP address - no configuration needed on DGX systems
An administrator needs to partition an NVIDIA H100 80GB GPU to provide isolated GPU instances for multiple users, each requiring approximately 10GB of memory. Which MIG profile should be created to maximize the number of instances per GPU?
- 1g.10gb
- 2g.20gb
- 3g.40gb
- 7g.80gb
Which command correctly lists all available MIG profiles supported by an NVIDIA GPU?
- nvidia-smi mig -lgip
- nvidia-smi --list-mig-profiles
- dcgmi mig --show-profiles
- nvmig list profiles
A DGX cluster administrator needs to configure BlueField-3 DPUs to operate in a mode where the Arm processor controls all NIC resources and the host system has restricted access. Which mode should be configured?
- DPU mode with INTERNAL_CPU_OFFLOAD_ENGINE enabled
- NIC mode with INTERNAL_CPU_OFFLOAD_ENGINE disabled
- Pass-through mode with ECPF disabled
- Hybrid mode with shared ECPF ownership
After creating GPU instances using 'nvidia-smi mig -cgi', what additional step is required before workloads can be scheduled to the MIG instances?
- Create Compute Instances (CI) within each GPU Instance
- Restart the nvidia-persistenced daemon
- Enable MIG mode at the driver level using nvidia-smi -mig 1
- Register the instances with Fabric Manager
Which BlueField-3 DPU mode provides the highest security by preventing the host system administrator from accessing the DPU, requiring all management through Arm cores or BMC connection?
- Zero-trust mode (Restricted Mode)
- Standard DPU mode
- Secure boot NIC mode
- Isolated BMC mode