Testing a GPU cluster physical state
Written by
Updated at April 18, 2025
Test the state of InfiniBand ports
-
Connect to the VM over SSH.
-
Install the
infiniband-diagspackage:sudo apt update sudo apt install infiniband-diags -
Run the
ibstatuscommand:Result:
state: 4: ACTIVE phys state: 5: LinkUp -
Make sure the
phys stateparameter is set toLinkUpfor all ports.
Test network performance
To test the data transfer rate between GPUs on different VMs:
-
Install the
perftestpackage on each test VM:sudo apt install perftest -
Connect to the first VM over SSH.
-
Run this command:
ib_send_bw --report_gbits -
Connect to the second VM over SSH.
-
Run this command:
ib_send_bw <first_VM_internal_IP> --report_gbitsResult:
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 1000 245.54 244.08 0.465536 -
Make sure the output shows non-zero values for these parameters:
BW average[Gb/sec]: Average transfer rateMsgRate[Mpps]: Message frequency