Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Cloud Stackland
  • What's new
  • Installation
    • All tutorials
    • Installing Stackland on Yandex BareMetal
    • Setting up external access to a pod in a cluster
    • All guides
    • Projects
    • Resource model
      • ClickHouse monitoring
      • NVIDIA® DCGM dashboard metrics
      • NVIDIA® DCGM dashboard metrics with MIG
      • NVIDIA® DCGM dashboard metrics without MIG
      • Hardware monitoring
  • Access management
  • Pricing policy
  • Diagnostics and troubleshooting

In this article:

  • Grafana dashboard
  • List of checks with alerts
  1. Concepts
  2. Cluster monitoring
  3. Hardware monitoring

Hardware monitoring

Written by
Yandex Cloud
Updated at April 8, 2026
  • Grafana dashboard
  • List of checks with alerts

Certain system errors may result from hardware failures rather than Kubernetes or other components. To monitor such failures, Stackland provides a ready-made solution that collects data from various sources, such as kernel logs, sysfs, disk SMART data, and more.

This page tells you where you can view hardware status alerts and key charts, and explains the conditions that trigger these alerts.

Grafana dashboardGrafana dashboard

You can monitor your hardware in a dedicated dashboard:

Grafana dashboard

To open the hardware monitoring metric dashboard, open grafana.sys.<cluster domain> and navigate to Dashbords > stackland-monitoring > Hardware Monitoring.

The dashboard's first section shows hardware status warnings. For example, the first warning shown on the screenshot is DiskIOErrors. This check monitors disk read and write errors. You can find more checks below.

The dashboard includes two charts: Disk Temperature and Disk I/O Errors.

List of checks with alertsList of checks with alerts

The dashboard shows alerts that indicate check results.

Check name

Description

How it works

DiskMissing

Disk not found

The system scans all available storage devices. If a previously available disk is not detected, the system logs a DiskMissing error.

DiskIOErrors

Disk read/write errors

During read and write operations, the system and disk controller exchange data. If read or write issues occur, the system logs DiskIOErrors.

DiskSmartFailed

Disk SMART failure

If a SMART attribute exceeds the threshold defined by the disk manufacturer, the system logs a DiskSmartFailed error.

DiskSmartUnavailable

Disk SMART failure

If the disk's SMART stops working and no longer reports hardware status data, the system logs a DiskSmartUnavailable error.

DiskConnection

Connection issues

SMART attribute 199 shows the number of corrected errors during SATA data transfer. A growing value can signal cable, connection, controller, or disk issues. If the attribute value rises, the system logs a DiskConnection error.

DiskTemperatureCritical

High disk temperature

SMART-enabled disks monitor their temperature and report it to the system. If the temperature approaches the maximum allowed value, the system logs a DiskTemperatureCritical error.

Was the article helpful?

Previous
NVIDIA® DCGM dashboard metrics without MIG
Next
Access management
© 2026 Direct Cursus Technology L.L.C.