Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
    • Setting up a Managed Service for PostgreSQL connection from a container in Serverless Containers
    • Creating a VM from a Container Optimized Image
    • Creating a VM from a Container Optimized Image with an additional volume for a Docker container
    • Creating an instance group from a Container Optimized Image with multiple Docker containers
    • Creating an instance group from a Container Optimized Image
    • Creating a VM from a Container Optimized Image with multiple Docker containers
    • Updating a Container Optimized Image VM
    • Configuring data output from a Docker container to a serial port
      • Creating a new Kubernetes project
      • Creating a Kubernetes cluster with no internet access
      • Creating a Kubernetes cluster using the Yandex Cloud provider for the Kubernetes Cluster API
      • Running workloads with GPUs
      • Using node groups with GPUs and no pre-installed drivers
      • Setting up time-slicing GPUs
      • Migrating resources to a different availability zone
      • Encrypting secrets
      • Connecting a BareMetal server as an external node to a Managed Service for Kubernetes cluster

In this article:

  • Required paid resources
  • Getting started
  • Component version requirements
  • Install GPU Operator
  • Check that drivers are installed correctly
  • Troubleshooting
  • Driver compilation errors
  • Delete the resources you created
  1. Container infrastructure
  2. Managed Service for Kubernetes
  3. Using node groups with GPUs and no pre-installed drivers

Using Yandex Managed Service for Kubernetes node groups with GPUs without pre-installed drivers

Written by
Yandex Cloud
Improved by
Dmitry A.
Updated at November 21, 2025
  • Required paid resources
  • Getting started
  • Component version requirements
  • Install GPU Operator
  • Check that drivers are installed correctly
  • Troubleshooting
    • Driver compilation errors
  • Delete the resources you created

You can use Managed Service for Kubernetes node groups for workloads on GPUs without pre-installed drivers. With GPU Operator, you can choose the driver version that suits your needs.

To set up a Managed Service for Kubernetes cluster and node group without pre-installed drivers for workloads:

  1. Install GPU Operator.
  2. Check that drivers are installed correctly.

If you no longer need the resources you created, delete them.

Required paid resourcesRequired paid resources

The support cost for this solution includes:

  • Fee for using the master and outgoing traffic in a Managed Service for Kubernetes cluster (see Managed Service for Kubernetes pricing).
  • Fee for using computing resources, OS, and storage in cluster nodes (VMs) (see Compute Cloud pricing).
  • Fee for a public IP address assigned to cluster nodes (see Virtual Private Cloud pricing).

Getting startedGetting started

  1. If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

    By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

  2. Create security groups for the Managed Service for Kubernetes cluster and its node groups.

    Warning

    The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.

  3. Create a Managed Service for Kubernetes cluster with any suitable configuration. When creating, specify the preconfigured security groups.

  4. Create a Managed Service for Kubernetes node group with the following settings:

    • Computing resources: Navigate to the GPU tab and select the platform.
    • Do not install GPU drivers: Select this option.
    • Security groups: Select the security groups you created earlier.
    • Node taints: Specify the nvidia.com/gpu=true:NoSchedule taint policy.
  5. Install kubect and configure it to work with the new cluster.

Component version requirementsComponent version requirements

Starting with Kubernetes version 1.30, for a group of nodes with GPUs to work correctly without pre-installed drivers, you will need the following:

  • GPU Operator 24.9.0 or higher.
  • NVIDIA driver 550.144.03 or higher.

Older version components may cause driver compilation errors.

Install GPU OperatorInstall GPU Operator

  1. Install Helm v3.8.0 or higher.

  2. Install GPU Operator:

    helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && \
    helm repo update && \
    helm install \
      --namespace gpu-operator \
      --create-namespace \
      --set driver.version=<driver_version> \
      gpu-operator nvidia/gpu-operator
    

    Where driver.version is the NVIDIA® driver version. If you skip this setting, the default version will be used.

    Note

    Recommended driver versions:

    • For node groups Kubernetes 1.30 or higher: 550.144.03 or higher.

    For the Managed Service for Kubernetes AMD EPYC™ with NVIDIA® Ampere® A100 (gpu-standard-v3) node group platform, use driver version 515.48.07.

    GPU Operator will be installed with default parameters. Learn more about parameters in the official guide.

    Tip

    You can view parameter values in the Helm chart's values.yaml configuration file. To do this, download the Helm chart archive using the helm pull --untar nvidia/gpu-operator command.

Check that drivers are installed correctlyCheck that drivers are installed correctly

Get the nvidia-driver-daemonset pod logs:

DRIVERS_POD_NAME="$(kubectl get pods --namespace gpu-operator | grep nvidia-driver-daemonset | awk '{print $1}')" && \
kubectl --namespace gpu-operator logs "${DRIVERS_POD_NAME}"

They should contain a message saying that the driver has been installed successfully, such as the following:

Defaulted container "nvidia-driver-ctr" out of: nvidia-driver-ctr, k8s-driver-manager (init)
DRIVER_ARCH is x86_64
Creating directory NVIDIA-Linux-x86_64-<driver_version>
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 <driver_version>

...

Loading NVIDIA driver kernel modules...
+ modprobe nvidia
+ modprobe nvidia-uvm
+ modprobe nvidia-modeset

...

Done, now waiting for signal

You can now run GPU workloads by following this tutorial.

TroubleshootingTroubleshooting

Driver compilation errorsDriver compilation errors

If you get compilation errors when installing drivers:

  1. Make sure you are running GPU Operator 24.9.0 or higher:

    helm list -n gpu-operator
    
  2. Use precompiled drivers:

    helm upgrade gpu-operator nvidia/gpu-operator \
      --namespace gpu-operator \
      --set driver.usePrecompiled=true \
      --set driver.version=550.144.03
    

Delete the resources you createdDelete the resources you created

Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them:

  1. Delete the Kubernetes cluster.
  2. If you created any service accounts, delete them.

Was the article helpful?

Previous
Running workloads with GPUs
Next
Setting up time-slicing GPUs
© 2025 Direct Cursus Technology L.L.C.