Running workloads with GPUs
A Managed Service for Kubernetes cluster allows running workloads on GPUs (GPUs), which may be of use in tasks with special computing requirements.
To run workloads using GPUs on Managed Service for Kubernetes cluster pods:
If you no longer need the resources you created, delete them.
Getting started
-
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the
--folder-name
or--folder-id
parameter. -
Create security groups for the Managed Service for Kubernetes cluster and its node groups.
Warning
The configuration of security groups determines the performance and availability of the cluster and the services and applications running in it.
-
Create a Managed Service for Kubernetes cluster with any suitable configuration. When creating them, specify the security groups prepared earlier.
-
Create a Managed Service for Kubernetes node group with the following settings:
- Platform: Select
With GPU
→Intel Broadwell with NVIDIA® Tesla v100
. - GPU: Specify the required number of GPUs.
- Security groups: Select the security groups created earlier.
- Platform: Select
-
Install kubectl
and configure it to work with the created cluster.
Create a pod with a GPU
-
Save the GPU pod creation specification to a YAML file named
cuda-vector-add.yaml
:apiVersion: v1 kind: Pod metadata: name: cuda-vector-add spec: restartPolicy: OnFailure containers: - name: cuda-vector-add # https://github.com/kubernetes/kubernetes/blob/v1.7.11/test/images/nvidia-cuda/Dockerfile image: "registry.k8s.io/cuda-vector-add:v0.1" resources: limits: nvidia.com/gpu: 1 # Request for 1 GPU.
To learn more about the pod creation specification, see the Kubernetes documentation
. -
Create a pod with a GPU:
kubectl create -f cuda-vector-add.yaml
Test the pod
-
View the information about the new pod:
kubectl describe pod cuda-vector-add
Result:
Name: cuda-vector-add Namespace: default Priority: 0 ... Normal Pulling 16m kubelet, cl1i7hcbti99********-ebyq Successfully pulled image "registry.k8s.io/cuda-vector-add:v0.1" Normal Created 16m kubelet, cl1i7hcbti99********-ebyq Created container cuda-vector-add Normal Started 16m kubelet, cl1i7hcbti99********-ebyq Created container
-
View the pod logs:
kubectl logs -f cuda-vector-add
Result:
[Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
Delete the resources you created
Delete the resources you no longer need to avoid paying for them:
- Delete the Managed Service for Kubernetes cluster.
- If you reserved a public static IP address for your Managed Service for Kubernetes cluster, delete it.