Setting up Time-Slicing GPUs in Yandex Managed Service for Kubernetes

Written by

Updated at March 5, 2026

Required paid resources
Getting started
Configure time-slicing GPUs
Test time-slicing GPUs
Delete the resources you created

Time-slicing GPUs in Kubernetes enables workloads scheduled on a single oversubscribed GPU to interleave with one another.

To set up time-slicing GPUs in Managed Service for Kubernetes:

If you no longer need the resources you created, delete them.

Required paid resources

The support cost for this solution includes:

Fee for using the master and outgoing traffic in a Managed Service for Kubernetes cluster (see Managed Service for Kubernetes pricing).
Fee for using computing resources, OS, and storage in cluster nodes (VMs) (see Compute Cloud pricing).
Fee for a public IP address assigned to cluster nodes (see Virtual Private Cloud pricing).

Getting started

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id options.
Create security groups for the Managed Service for Kubernetes cluster and its node groups.

Warning

The configuration of security groups determines performance and availability of the cluster and the services and applications running in it.
Create a Managed Service for Kubernetes cluster. When creating, specify the preconfigured security groups.
Create a Managed Service for Kubernetes node group with GPU NVIDIA® Tesla® T4 and preconfigured security groups.
Install kubect and configure it to work with the new cluster.

Configure time-slicing GPUs

Create a time-slicing configuration:

Create the time-slicing-config.yaml file with the following content:

---
kind: Namespace
apiVersion: v1
metadata:
  name: gpu-operator

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
  namespace: gpu-operator
data:
  a100-80gb: |-
    version: v1
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 5
  tesla-t4: |-
    version: v1
    sharing:
      timeSlicing:
        resources:
        - name: nvidia.com/gpu
          replicas: 5

Run this command:

kubectl create -f time-slicing-config.yaml

Result:

namespace/gpu-operator created
configmap/time-slicing-config created

Install GPU Operator:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && \
helm repo update && \
helm install \
  --namespace gpu-operator \
  --create-namespace \
  --set devicePlugin.config.name=time-slicing-config \
  gpu-operator nvidia/gpu-operator

Apply the time-slicing configuration to your Managed Service for Kubernetes cluster or node group:

Managed Service for Kubernetes cluster

Managed Service for Kubernetes node group

kubectl patch clusterpolicies.nvidia.com/cluster-policy \
  --namespace gpu-operator \
  --type merge \
  --patch='{"spec": {"devicePlugin": {"config": {"name": "time-slicing-config", "default": "tesla-t4"}}}}'

yc managed-kubernetes node-group add-labels <node_group_ID_or_name> \
  --labels nvidia.com/device-plugin.config=tesla-t4

You can get the Managed Service for Kubernetes node group ID and name with the list of node groups in the folder.

Test time-slicing GPUs

Create a test app:

Save the following app specification to a YAML file named nvidia-plugin-test.yml.

Deployment is the Kubernetes API object that manages the replicated application.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nvidia-plugin-test
  labels:
    app: nvidia-plugin-test
spec:
  replicas: 5
  selector:
    matchLabels:
      app: nvidia-plugin-test
  template:
    metadata:
      labels:
        app: nvidia-plugin-test
    spec:
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
        - name: dcgmproftester11
          image: nvidia/samples:dcgmproftester-2.0.10-cuda11.0-ubuntu18.04
          command: ["/bin/sh", "-c"]
          args:
            - while true; do /usr/bin/dcgmproftester11 --no-dcgm-validation -t 1004 -d 300; sleep 30; done
          resources:
            limits:
              nvidia.com/gpu: 1
          securityContext:
            capabilities:
              add: ["SYS_ADMIN"]

Run this command:

kubectl apply -f nvidia-plugin-test.yml

Result:

deployment.apps/nvidia-plugin-test created

Make sure all the app's five Managed Service for Kubernetes pods are Running:
```
kubectl get pods | grep nvidia-plugin-test
```

Run the nvidia-smi command in the running nvidia-container-toolkit Managed Service for Kubernetes pod:

kubectl exec <nvidia-container-toolkit_pod_name> \
  --namespace gpu-operator -- nvidia-smi

Result:

Defaulted container "nvidia-container-toolkit-ctr" out of: nvidia-container-toolkit-ctr, driver-validation (init)
Thu Jan 26 09:42:51 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:8B:00.0 Off |                    0 |
| N/A   72C    P0    70W /  70W |   1579MiB / 15360MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     43108      C   /usr/bin/dcgmproftester11         315MiB |
|    0   N/A  N/A     43211      C   /usr/bin/dcgmproftester11         315MiB |
|    0   N/A  N/A     44583      C   /usr/bin/dcgmproftester11         315MiB |
|    0   N/A  N/A     44589      C   /usr/bin/dcgmproftester11         315MiB |
|    0   N/A  N/A     44595      C   /usr/bin/dcgmproftester11         315MiB |
+-----------------------------------------------------------------------------+

Delete the resources you created

Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them:

Delete the Managed Service for Kubernetes cluster.
If you created any service accounts, delete them.

Setting up Time-Slicing GPUs in Yandex Managed Service for Kubernetes

Required paid resourcesRequired paid resources

Getting startedGetting started

Configure time-slicing GPUsConfigure time-slicing GPUs

Test time-slicing GPUsTest time-slicing GPUs

Delete the resources you createdDelete the resources you created

Was the article helpful?

Required paid resources

Getting started

Configure time-slicing GPUs

Test time-slicing GPUs

Delete the resources you created