Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Compute Cloud
    • All guides
      • Creating a GPU cluster
      • Adding a VM to a GPU cluster
      • Updating a GPU cluster
      • Getting GPU cluster info
      • Configuring GPU cluster access permissions
      • Deleting a GPU cluster
      • Testing a GPU cluster physical state
      • Running parallel tasks in a GPU cluster
      • Testing InfiniBand throughput
    • Viewing service resource operations
    • Viewing metrics in Monitoring
    • NVIDIA driver update guide
  • Yandex Container Solution
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes
  1. Step-by-step guides
  2. GPU clusters
  3. Testing InfiniBand throughput

Testing InfiniBand throughput

Written by
Yandex Cloud
Updated at May 13, 2025
  1. Connect to the VM over SSH.

  2. Install tools for testing:

    sudo apt update
    sudo apt install perftest numactl
    
  3. Create a file named /etc/security/limits.d/limits.conf with the following contents:

    * soft memlock unlimited
    * hard memlock unlimited
    
  4. Log out and log back in or reboot the machine to apply the changes. Check the limit using this command:

    ulimit -l
    

    The result should be unlimited.

  5. Create a file named infiniband_test.sh with the following contents:

    #!/bin/bash
    set -eu
    
    # Testing the memlock limit
    echo "Current memlock limit:"
    ulimit -l
    if [[ $(ulimit -l) != "unlimited" ]]; then
       echo "Memlock limit is not unlimited."
       echo "Create a file named /etc/security/limits.d/limits.conf with the following content:"
       echo "* soft memlock unlimited"
       echo "* hard memlock unlimited"
       exit 1
    fi
    
    # Cleanup funciton: Terminate all ib_write_bw processes upon script completion
    clean() {
       killall -9 ib_write_bw &>/dev/null
    }
    trap clean EXIT
    
    # Test parameters
    size=33554432  # Block size in bytes
    iters=10000    # Number of iterations
    q=1
    
    # Specify CPU numbers and network device names for different NUMA nodes
    # For example:
    numa0_cpu=40      # Client CPU (NUMA node 0)
    numa1_cpu=130     # Server CPU (NUMA node 1)
    numa0_net=mlx5_0  # Network interface for the client
    numa1_net=mlx5_7  # Network interface for the server
    
    # Start the server on NUMA node 1
    numactl -C $numa1_cpu --membind 1 /usr/bin/ib_write_bw --ib-dev=$numa1_net --report_gbits -s $size  --iters $iters -q $q &>/dev/null &
    sleep 1
    
    # Start the client on NUMA node 0 with high priority
    nice -20 numactl -C $numa0_cpu  --membind 0 /usr/bin/ib_write_bw --ib-dev=$numa0_net --report_gbits -s $size --iters $iters -q $q localhost &
    wait
    
  6. Make the script executable:

    chmod +x infiniband_test.sh
    
  7. Run the script:

    ./infiniband_test.sh
    

    Result:

    ---------------------------------------------------------------------------------------
    #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
    33554432    10000            394.58             394.40                    0.001469
    ---------------------------------------------------------------------------------------
    

Was the article helpful?

Previous
Running parallel tasks in a GPU cluster
Next
Viewing service resource operations
© 2025 Direct Cursus Technology L.L.C.