Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Compute Cloud
    • All guides
    • Viewing service resource operations
    • Viewing metrics in Monitoring
    • NVIDIA driver update guide
  • Yandex Container Solution
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Supported drivers and recommendations
  • Shared NVSwitch virtualization model
  • Why use driver version 535?
  • CUDA update
  • Ubuntu installation example
  • Issue with sudo rebootafter updating the driver to a version higher than 535 and the recommended workaround
  1. Step-by-step guides
  2. NVIDIA driver update guide

NVIDIA driver update guide

Written by
Yandex Cloud
Updated at December 3, 2025
  • Supported drivers and recommendations
  • Shared NVSwitch virtualization model
  • Why use driver version 535?
  • CUDA update
    • Ubuntu installation example
  • Issue with sudo rebootafter updating the driver to a version higher than 535 and the recommended workaround

Warning

This guide covers the gpu-standard-v3 (AMD EPYC™ with NVIDIA® Ampere® A100) and gpu-standard-v3i platforms.

For gpu-standard-v3i, you can only use an image with the NVIDIA 535 driver and secure-boot support. The video driver cannot be updated on this platform; you can only update the CUDA library.

Supported drivers and recommendationsSupported drivers and recommendations

In Yandex Cloud, the gpu-standard-v3 (AMD EPYC™ with NVIDIA® Ampere® A100) and gpu-standard-v3i VMs are preconfigured with the NVIDIA 535 driver.
We recommend using this specific driver version; driver updates to other versions are not supported and may lead to unstable GPU performance.

Shared NVSwitch virtualization modelShared NVSwitch virtualization model

We use the Shared NVSwitch virtualization model described in NVIDIA Fabric Manager User Guide.

NVSwitch devices are taken to a separate auxiliary VM and controlled by the NVIDIA 535 driver. When you start a guest VM, GPUs are preconfigured for NVLink; to keep this preconfiguration, you are not allowed to software reset GPUs from user VMs in Yandex Cloud.

If you update the user VM driver to another version, e.g., 570, the driver may fail to recognize the current GPU state. This is a NVIDIA driver limitation. This is why we do not recommend changing the user VM driver version.

Why use driver version 535?Why use driver version 535?

NVIDIA publishes multiple driver branches (NVIDIA Data Center Drivers Overview):

  • LTSB (Long-Term Support Branch): Long-term support, security updates and fixes for 3 years.
  • PB (Production Branch): Main branch for data centers.
  • NFB (New Feature Branch): Drivers with new features.

Version 535 belongs to LTSB; it is validated and supported in the Yandex Cloud infrastructure. Drivers from other branches fail the compatibility check and may work incorrectly.

CUDA updateCUDA update

Oftentimes, it is not a new driver that you need but CUDA Toolkit update. In most cases, you do not need to update the driver, it is enough to install the required CUDA version and the cuda-compat package for compatibility with the 535 driver (CUDA Forward Compatibility).

Ubuntu installation exampleUbuntu installation example

  1. Connect the NVIDIA CUDA repository:

    sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu$(lsb_release -rs | sed -e 's/\.//')/x86_64/3bf863cc.pub
    sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
    sudo apt update
    
  2. Install cuda-compat (example for CUDA 12.5):

    sudo apt install -y cuda-compat-12-5
    echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.5/compat:$LD_LIBRARY_PATH' >> ~/.bashrc && source ~/.bashrc
    
  3. Check the current configuration:

    nvidia-smi
    

Issue with after updating the driver to a version higher than 535 and the recommended workaroundIssue with sudo rebootafter updating the driver to a version higher than 535 and the recommended workaround

When you reinstall the driver followed by sudo reboot, the driver does not have enough time to upload correctly. And as Yandex Cloud prohibits GPU software reset, the card remains invalid. While this does not cause any hardware issues, the VM will operate incorrectly. Use the yc compute instance restart command instead of sudo reboot.

This is why we do not recommend updating the driver to a version higher than 535. If you need to install a driver version higher than 535 and reboot the user VM, use the following workaround scenario:

  1. Install the driver.

    Script for Ubuntu
    #!/bin/bash
    set -e
    
    # Fixing the architecture
    arch="x86_64"
    
    # Figuring out the Ubuntu version (20.04 -> ubuntu2004, 22.04 -> ubuntu2204, etc.)
    . /etc/os-release
    if [[ "$ID" != "ubuntu" ]]; then
      echo "This script is for Ubuntu only!"
      exit 1
    fi
    distro="ubuntu${VERSION_ID//./}"
    
    echo "Using the repository: $distro/$arch"
    
    # 1. Downloading the package with keys
    wget https://developer.download.nvidia.com/compute/cuda/repos/${distro}/${arch}/cuda-keyring_1.1-1_all.deb
    
    # 2. Installing the keys
    sudo dpkg -i cuda-keyring_1.1-1_all.deb || {
      echo "Failed to install cuda-keyring, performing alternative steps..."
      
      # 2a. Uploading the GPG key manually
      wget https://developer.download.nvidia.com/compute/cuda/repos/${distro}/${arch}/cuda-archive-keyring.gpg
      
      # 2b. Putting the key in the correct location
      sudo mv cuda-archive-keyring.gpg /usr/share/keyrings/cuda-archive-keyring.gpg
      
      # 2c. Connecting the CUDA repository manually
      echo "deb [signed-by=/usr/share/keyrings/cuda-archive-keyring.gpg] \
      https://developer.download.nvidia.com/compute/cuda/repos/${distro}/${arch}/ /" \
      | sudo tee /etc/apt/sources.list.d/cuda-${distro}-${arch}.list
    }
    
    # 3. Updating the list of packages
    sudo apt update
    
    # 4. Installing NVIDIA drivers
    sudo apt install -y nvidia-open
    
    # 5. Installing the CUDA driver metapackage
    sudo apt install -y cuda-drivers
    
  2. You need to go through the next steps before you reboot the system via sudo reboot.
    Create a script named /usr/libexec/manage-nvidia:

    #!/bin/bash
    set -eu
    usage() {
             echo "usage: manage-nvidia (load|unload)"
             exit 1
    }
    [ $# -eq 1 ] || usage
    case "$1" in
             load)   modprobe nvidia ;;
             unload) modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nvidia ;;
             *)      usage ;;
    esac
    
  3. Make the script executable:

    sudo chmod +x /usr/libexec/manage-nvidia
    
  4. Create a systemd unit named /etc/systemd/system/manage-nvidia.service:

    [Unit]
    Description=Manage NVIDIA driver
    Before=nvidia-persistenced.service
    
    [Service]
    Type=oneshot
    ExecStart=/usr/libexec/manage-nvidia load
    RemainAfterExit=true
    ExecStop=/usr/libexec/manage-nvidia unload
    StandardOutput=journal
    
    [Install]
    WantedBy=multi-user.target
    RequiredBy=nvidia-persistenced.service
    
  5. Reload the systemd configuration, configure manage-nvidia to autorun on boot, and start the service itself:

    sudo systemctl daemon-reload
    sudo systemctl enable --now manage-nvidia
    

    Expected output if the execution is successful:

    Created symlink /etc/systemd/system/multi-user.target.wants/manage-nvidia.service → /etc/systemd/system/manage-nvidia.service.
    Created symlink /etc/systemd/system/nvidia-persistenced.service.requires/manage-nvidia.service → /etc/systemd/system/manage-nvidia.service.
    

    Check nvidia-persistenced.service for dependency on manage-nvidia.service:

    sudo systemctl list-dependencies nvidia-persistenced | grep manage-nvidia   
    

    Result:

    ● ├─manage-nvidia.service
    

    Check the service status:

    sudo systemctl status manage-nvidia
    

With that done, during sudo reboot, systemd will call ExecStop for manage-nvidia, the driver will be uploaded correctly, and rebooting will not invalidate the GPU.

Was the article helpful?

Previous
Viewing metrics in Monitoring
Next
Yandex Container Solution
© 2025 Direct Cursus Technology L.L.C.