Analyzing the status of BareMetal server disks using HW Watcher
If you encounter disk read/write errors, disk or RAID array failures while using with the BareMetal server, you can run server diagnostics to identify the source of the problem and generate a report for support.
Note
You can only use
HW Watcher on Linux servers.
Generate a report
To generate a system status report using
HW Watcher:
-
Connect to a Linux server over SSH by running the following command in the terminal:
ssh root@<server_public_IP_address>
-
Download
HW Watcher:
wget https://storage.yandexcloud.net/baremetal/support/hwcheck
-
Add the root user permissions to run the downloaded executable file:
chmod u+x hwcheck
-
Run the downloaded utility:
./hwcheck
As a result, the report files will be saved to an archive:
... Save data to archive: hwcheck_my-sample-server-_2025-05-27_20-31-04.tgz
-
To view status reports for individual server disks, unpack the archive by specifying its name:
tar -xvzf <file_name_with_report_archive>
-
In the
drivedirectory, see the list of disk status report files:
ls ./drive/ -l
Result:
total 24 -rw-r--r-- 1 root root 8957 May 27 20:31 sda-smartctl -rw-r--r-- 1 root root 8957 May 27 20:31 sdb-smartctl
-
Look up the contents of the report file for the disk of interest. The disk device ID is specified in the file name:
cat ./drive/sda-smartctl
Among other things, a table of the disk's SMART attribute values will be displayed:
... SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 5 Reallocated_Sector_Ct PO--CK 100 100 010 - 0 9 Power_On_Hours -O--CK 086 086 000 - 67715 12 Power_Cycle_Count -O--CK 099 099 000 - 108 177 Wear_Leveling_Count PO--C- 062 062 005 - 1182 179 Used_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 0 180 Unused_Rsvd_Blk_Cnt_Tot PO--C- 100 100 010 - 17618 181 Program_Fail_Cnt_Total -O--CK 100 100 000 - 0 182 Erase_Fail_Count_Total -O--CK 100 100 000 - 0 183 Runtime_Bad_Block PO--C- 100 100 010 - 0 184 End-to-End_Error PO--CK 100 100 097 - 0 187 Reported_Uncorrect -O--CK 100 100 000 - 0 190 Airflow_Temperature_Cel -O--CK 074 049 000 - 26 195 Hardware_ECC_Recovered -O-RC- 200 200 000 - 0 199 UDMA_CRC_Error_Count -OSRCK 100 100 000 - 0 202 Unknown_SSD_Attribute PO--CK 100 100 010 - 0 235 Unknown_Attribute -O--C- 099 099 000 - 68 241 Total_LBAs_Written -O--CK 099 099 000 - 2179265164499 ||||||_ K auto-keep |||||__ C event count ||||___ R error rate |||____ S speed/performance ||_____ O updated online |______ P prefailure warning ...
You can either analyze the results yourself or contact support for assistance.
Send a report to support
To send the resulting report to support:
-
Copy the report from the server to your local computer by running this command in the local computer terminal:
scp root@<server_public_IP_address>:/root/<file_name_with_report_archive> ./
This will save the report file to the current directory on your local computer.
-
Create a request to support.
In your request, describe the issue with the server in detail and attach the saved archive with the report.