Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Cloud Stackland
  • What's new
  • Installation
    • All tutorials
    • Installing Stackland on Yandex BareMetal
    • Setting up external access to a pod in a cluster
    • All guides
    • Projects
    • Resource model
      • Overview
      • Certificate Manager
      • DNS
      • IAM
      • Logging Stack
      • Managed Service for Apache Kafka®
      • Managed Service for PostgreSQL
      • Managed Service for ClickHouse®
      • DataLens
      • Monitoring
      • Object Storage
      • Disk subsystem
      • NVIDIA® GPU support
      • Policy Manager
      • Secrets Store
      • SpeechSense
  • Access management
  • Pricing policy
  • Diagnostics and troubleshooting

In this article:

  • Main components
  • NVIDIA® driver
  • NVIDIA® Container Toolkit
  • NVIDIA® Fabric Manager
  • NVIDIA® Operator
  • DCGM
  • DCGM Exporter
  • GPU monitoring
  • Using GPUs in pods
  • Configuration
  • MIG Manager settings
  • See also
  1. Concepts
  2. Components
  3. NVIDIA® GPU support

NVIDIA® GPU support

Written by
Yandex Cloud
Updated at April 8, 2026
  • Main components
    • NVIDIA® driver
    • NVIDIA® Container Toolkit
    • NVIDIA® Fabric Manager
    • NVIDIA® Operator
    • DCGM
    • DCGM Exporter
  • GPU monitoring
  • Using GPUs in pods
  • Configuration
    • MIG Manager settings
  • See also

Stackland enables you to provision NVIDIA® GPUs in a Stackland cluster using NVIDIA® GPU support, a component which automates management of GPU resources and ensures their availability for workloads. An implementation of the NVIDIA® GPU Operator, it provides a comprehensive toolkit for GPU provisioning in Kubernetes.

NVIDIA® GPU support use cases include:

  • Auto-detection of GPUs on cluster nodes
  • Provisioning GPUs as Kubernetes resources for pods
  • Support for GPU virtualization technologies (multi-instance GPU or MIG)
  • Support for NVLink to create GPU clusters
  • GPU health monitoring and metric collection

NVIDIA® GPU support requires NVIDIA® GPU nodes to operate.

Main componentsMain components

NVIDIA® driverNVIDIA® driver

Version: 580.126

The NVIDIA® driver provides a low-level interface between the OS and GPU. The driver exposes the GPU hardware capabilities, manages device memory, and handles commands from applications.

NVIDIA® Container ToolkitNVIDIA® Container Toolkit

Version: 580.126

NVIDIA® Container Toolkit enables running GPU-accelerated containers. The toolkit integrates with the container runtime and provides GPU access to containers via the Container Device Interface (CDI). This component automatically configures the container environment, mounts the required libraries and devices, and manages GPU resource isolation across containers.

NVIDIA® Fabric ManagerNVIDIA® Fabric Manager

Version: 580.126

NVIDIA® Fabric Manager manages NVLink and NVSwitch in multi-GPU systems. This component ensures high-speed GPU interconnection, optimizes communication topology, and manages distributed memory in multi-GPU configurations.

NVIDIA® OperatorNVIDIA® Operator

Version: 25.10

The NVIDIA® NVIDIA® GPU support automates GPU management in a Kubernetes cluster. It creates, configures, and manages components required for GPU provisioning, including drivers, libraries, device plugins, and monitoring systems. The NVIDIA® GPU Operator uses CRDs to manage the lifecycle of GPU components.

DCGMDCGM

NVIDIA® Data Center GPU Manager (DCGM) is a tool for monitoring and management of datacenter GPUs. DCGM collects performance, temperature, memory usage, and other GPU metrics.

DCGM ExporterDCGM Exporter

DCGM Exporter exports GPU metrics in Prometheus format. The monitoring component automatically collects metrics and exposes them for visualization in Grafana.

GPU monitoringGPU monitoring

DCGM Exporter automatically collects GPU metrics and makes them available in Grafana. Stackland provides prebuilt dashboards for GPU monitoring:

  • NVIDIA® DCGM Dashboard: Overview dashboard with metrics of all cluster GPUs.
  • NVIDIA® DCGM Dashboard with MIG metrics: Dashboard for MIG GPU monitoring.
  • NVIDIA® DCGM Dashboard w/o MIG metrics: Dashboard for non-MIG GPU monitoring.

Using GPUs in podsUsing GPUs in pods

To use a GPU in a pod, specify the nvidia.com/gpu resource in the container specification:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  containers:
  - name: cuda-container
    image: nvidia/cuda:12.0-base
    resources:
      limits:
        nvidia.com/gpu: 1

Kubernetes will automatically place the pod on a node with an available GPU.

ConfigurationConfiguration

MIG Manager settingsMIG Manager settings

migManager:
  enabled: false
  strategy: "single"
  config:
    default: "all-disabled"
  • enabled: Enables multi-instance GPU support.
  • strategy: MIG strategy. The possible values are single to apply the same MIG configuration to all GPUs on the node or mixed to use different MIG configurations on different GPUs.
  • config.default: Default MIG configuration.

To enable MIG support, set enabled to true and configure the GPU node:

kubectl label nodes my-node nvidia.com/mig.config=all-1g.5gb --overwrite

This command applies a MIG profile to my-node to partition each of the node’s GPU into multiple independent GPUs, each with one compute slice and 5 GB of video memory.

To view all available MIG profiles, run the following command:

kubectl -n stackland-nvidia-gpu get cm default-mig-parted-config -o jsonpath='{.data.config\.yaml}'

See alsoSee also

  • Cluster and component monitoring
  • GPU Operator guides

Was the article helpful?

Previous
Disk subsystem
Next
Policy Manager
© 2026 Direct Cursus Technology L.L.C.