Graphics processing units (GPUs)
Written by
Updated at May 14, 2026
How do I test a GPU cluster physical state?
- Test the InfiniBand ports.
- Test the network.
For more information, see Testing a GPU cluster physical state.
How do I run parallel tasks in a GPU cluster?
To run parallel tasks in a GPU cluster:
- Connect to each VM over SSH and install Open MPI
and NCCL . - On the main VM, build NVIDIA tests and set up passwordless SSH keys.
- Add a public key to
authorized_keyson each VM. - On the main VM, run the
mpiruncommand specifying the VM IP addresses and number of GPUs.
For more information, see Running parallel tasks in a GPU cluster.
How do I test InfiniBand throughput?
To check InfiniBand throughput, create and run a script for starting perftest tests with numactl.
For more information, see Testing InfiniBand throughput.
What should I do if there is a GPU failure on the VM?
Try stopping and restarting the VM. This is usually more effective than a reboot, since when you reboot a VM, it remains on the same host where the GPU issue occurred.