Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Compute Cloud
  • Yandex Container Solution
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes
    • General questions
    • Virtual machines
    • _Not enough resources_ error
    • Connection
    • Disks, snapshots, and images
    • Instance groups
    • Graphics processing units (GPUs)
    • Monitoring
    • Licensing
    • Troubleshooting
    • All questions on one page
  1. FAQ
  2. Graphics processing units (GPUs)

Graphics processing units (GPUs)

Written by
Yandex Cloud
Updated at May 14, 2026
  • How do I test a GPU cluster physical state?

  • How do I run parallel tasks in a GPU cluster?

  • How do I test InfiniBand throughput?

  • What should I do if there is a GPU failure on the VM?

How do I test a GPU cluster physical state?How do I test a GPU cluster physical state?

  • Test the InfiniBand ports.
  • Test the network.

For more information, see Testing a GPU cluster physical state.

How do I run parallel tasks in a GPU cluster?How do I run parallel tasks in a GPU cluster?

To run parallel tasks in a GPU cluster:

  1. Connect to each VM over SSH and install Open MPI and NCCL.
  2. On the main VM, build NVIDIA tests and set up passwordless SSH keys.
  3. Add a public key to authorized_keys on each VM.
  4. On the main VM, run the mpirun command specifying the VM IP addresses and number of GPUs.

For more information, see Running parallel tasks in a GPU cluster.

How do I test InfiniBand throughput?How do I test InfiniBand throughput?

To check InfiniBand throughput, create and run a script for starting perftest tests with numactl.

For more information, see Testing InfiniBand throughput.

What should I do if there is a GPU failure on the VM?What should I do if there is a GPU failure on the VM?

Try stopping and restarting the VM. This is usually more effective than a reboot, since when you reboot a VM, it remains on the same host where the GPU issue occurred.

Was the article helpful?

Previous
Instance groups
Next
Monitoring
© 2026 Direct Cursus Technology L.L.C.