Testing a GPU cluster physical state
Written by
Updated at April 18, 2025
Test the state of InfiniBand ports
-
Connect to the VM over SSH.
-
Install the
infiniband-diags
package:sudo apt update sudo apt install infiniband-diags
-
Run the
ibstatus
command:Result:
state: 4: ACTIVE phys state: 5: LinkUp
-
Make sure the
phys state
parameter is set toLinkUp
for all ports.
Test network performance
To test the data transfer rate between GPUs on different VMs:
-
Install the
perftest
package on each test VM:sudo apt install perftest
-
Connect to the first VM over SSH.
-
Run this command:
ib_send_bw --report_gbits
-
Connect to the second VM over SSH.
-
Run this command:
ib_send_bw <first_VM_internal_IP> --report_gbits
Result:
#bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 1000 245.54 244.08 0.465536
-
Make sure the output shows non-zero values for these parameters:
BW average[Gb/sec]
: Average transfer rateMsgRate[Mpps]
: Message frequency