Running workloads with GPUs in Yandex Managed Service for Kubernetes
A Managed Service for Kubernetes cluster allows running workloads on GPUs (GPUs), which may be of use in tasks with special computing requirements.
To run workloads using GPUs on Managed Service for Kubernetes cluster pods:
If you no longer need the resources you created, delete them.
Required paid resources
The support cost includes:
- Managed Service for Kubernetes cluster fee: using the master and outgoing traffic (see Managed Service for Kubernetes pricing).
- Cluster nodes (VM) fee: using computing resources, operating system, and storage (see Compute Cloud pricing).
- Fee for a public IP address assigned to cluster nodes (see Virtual Private Cloud pricing).
Getting started
-
If you do not have the Yandex Cloud CLI yet, install and initialize it.
The folder specified when creating the CLI profile is used by default. To change the default folder, use the
yc config set folder-id <folder_ID>
command. You can also specify a different folder for any command using the--folder-name
or--folder-id
parameter. -
Create security groups for the Managed Service for Kubernetes cluster and its node groups.
Warning
The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.
-
Create a Managed Service for Kubernetes cluster with any suitable configuration. When creating them, specify the security groups prepared earlier.
-
Create a Managed Service for Kubernetes node group with the following settings:
- Platform: Select
With GPU
→Intel Broadwell with NVIDIA® Tesla v100
. - GPU: Specify the required number of GPUs.
- Security groups: Select the security groups created earlier.
- Platform: Select
-
Install kubect
and configure it to work with the new cluster.
Create a pod with a GPU
-
Save the GPU pod creation specification to a YAML file named
cuda-vector-add.yaml
:apiVersion: v1 kind: Pod metadata: name: cuda-vector-add spec: restartPolicy: OnFailure containers: - name: cuda-vector-add # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile image: "registry.k8s.io/cuda-vector-add:v0.1" resources: limits: nvidia.com/gpu: 1 # Request for 1 GPU.
To learn more about the pod creation specification, see the Kubernetes documentation
. -
Create a pod with a GPU:
kubectl create -f cuda-vector-add.yaml
Test the pod
-
View the information about the new pod:
kubectl describe pod cuda-vector-add
Result:
Name: cuda-vector-add Namespace: default Priority: 0 ... Normal Pulling 16m kubelet, cl1i7hcbti99********-ebyq Successfully pulled image "registry.k8s.io/cuda-vector-add:v0.1" Normal Created 16m kubelet, cl1i7hcbti99********-ebyq Created container cuda-vector-add Normal Started 16m kubelet, cl1i7hcbti99********-ebyq Created container
-
View the pod logs:
kubectl logs -f cuda-vector-add
Result:
[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
- Delete the Managed Service for Kubernetes cluster.
- If you reserved a public static IP address for your Managed Service for Kubernetes cluster, delete it.