Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Cloud Stackland
  • What's new
  • Installation
    • All tutorials
    • Installing Stackland on Yandex BareMetal
    • Setting up external access to a pod in a cluster
    • All guides
      • Diagnosing a disk subsystem
    • Projects
    • Resource model
  • Access management
  • Pricing policy
  • Diagnostics and troubleshooting

In this article:

  • Status diagnostics
  • Common issues
  • VolumeGroupSyncedOnNode = False
  • PhantomDeviceDetected
  • PVC froze in Pending status
  • Disk I/O errors
  • Corrupt disk replacement
  • Step 1: Identify the corrupt disk
  • Step 2: Transfer data from the disk (if possible)
  • Step 3: Delete the disk from the volume group
  • Step 4: Replace the physical disk
  • Step 5: Make sure the new disk has been detected
  • Step 6: Check the recovery
  1. Step-by-step guides
  2. Disk subsystem
  3. Diagnosing a disk subsystem

Diagnosing a disk subsystem

Written by
Yandex Cloud
Updated at April 8, 2026
  • Status diagnostics
  • Common issues
    • VolumeGroupSyncedOnNode = False
    • PhantomDeviceDetected
    • PVC froze in Pending status
    • Disk I/O errors
  • Corrupt disk replacement
    • Step 1: Identify the corrupt disk
    • Step 2: Transfer data from the disk (if possible)
    • Step 3: Delete the disk from the volume group
    • Step 4: Replace the physical disk
    • Step 5: Make sure the new disk has been detected
    • Step 6: Check the recovery

This page describes the typical issues of a Stackland disk subsystem and how to fix them.

Status diagnosticsStatus diagnostics

Check the status of volume groups on all nodes:

kubectl get volumegroups -A -o wide

Check synchronization conditions:

kubectl get volumegroups -A -o custom-columns="NAME:.metadata.name,NODE:.spec.nodeName,REASON:.status.conditions[0].reason,STATUS:.status.conditions[0].status"

For detailed diagnostics, connect to the topovgm-operator pod on the relevant node. All LVM utilities are available in the pod:

kubectl -n stackland-volumes exec -it <topovgm_pod_name> -- sh

Run the following LVM commands inside the pod:

pvs   # List of physical volumes
vgs   # List of volume groups
lvs   # List of logical volumes

Common issuesCommon issues

This section lists the typical issues of a disk subsystem and relevant fixes.

VolumeGroupSyncedOnNode = FalseVolumeGroupSyncedOnNode = False

The VolumeGroupSyncedOnNode condition in the VolumeGroup resource status is set to False.

  • The disk is missing or available. Check the physical connection of the disk. Run pvs in the topovgm-operator pod and make sure that all physical volumes are visible.

  • Volume Group initialization error. Check the operator logs:

    kubectl -n stackland-volumes logs -l app.kubernetes.io/name=topovgm-operator --tail=100
    
  • The disk contains data and was not auto-detected. Check the status.discoveredDevices field of the VolumeGroup resource: it will state the reason why the disk was excluded.

PhantomDeviceDetectedPhantomDeviceDetected

The PhantomDeviceDetected condition on the node. This means the disk which earlier belonged to the volume group is now temporarily unavailable, but its metadata was preserved in LVM.

  1. Check the physical connection of the disk.
  2. If the disk was reconnected, the operator will automatically update the device-mapper tables and restore the volume group.
  3. If the disk was replaced, follow the corrupt disk replacement procedure.

PVC froze in Pending statusPVC froze in Pending status

PersistentVolumeClaim is in Pending status for a long time.

  • Not enough storage space in the volume group of required type. Check for free space:

    kubectl get volumegroups -A -o custom-columns="NAME:.metadata.name,NODE:.spec.nodeName,FREE:.status.free,SIZE:.status.size"
    
  • A non-existing storage class is specified. Check the storage class name in the PVC manifest. The valid values are stackland-nvme, stackland-ssd, stackland-hdd, and stackland-other.

  • No disks of required type on the nodes. For example, if stackland-nvme is specified, but there are no NVMe disks, no volume group will be created for this type.

Disk I/O errorsDisk I/O errors

The DiskIOErrors alert triggers in hardware monitoring. Applications report write or read errors.

  1. Identify the defective disk based on the Hardware Monitoring dashboard in Grafana.
  2. If, according to SMART monitoring, the disk is defective, follow the corrupt disk replacement procedure.

Corrupt disk replacementCorrupt disk replacement

Warning

Before replacing a disk, make sure its data is backed up or replicated. If the disk has logical volumes containing unreplicated data, this data will be lost.

Step 1: Identify the corrupt diskStep 1: Identify the corrupt disk

Find the volume group with the defective disk:

kubectl get volumegroups -A -o wide

Get detailed status of the volume group:

kubectl describe volumegroup <vg_name> -n stackland-volumes

In the status.physicalVolumes field, find a disk with the attributes attribute containing the m (missing) property.

Connect to the topovgm-operator pod on the relevant node and check the status of its physical volumes:

kubectl -n stackland-volumes exec -it <topovgm_pod_name> -- pvs

Step 2: Transfer data from the disk (if possible)Step 2: Transfer data from the disk (if possible)

If the disk is still available and there is data on it, transfer it to other disks in the volume group:

kubectl -n stackland-volumes exec -it <topovgm_pod_name> -- pvmove /dev/<disk_name>

If the disk is completely unavailable, skip this step.

Step 3: Delete the disk from the volume groupStep 3: Delete the disk from the volume group

If the disk is unavailable, set the following deletion policy in the VolumeGroup resource. To delete without data:

kubectl patch volumegroup <vg_name> -n stackland-volumes --type=merge \
  -p '{"spec":{"deviceLossSynchronizationPolicy":"Remove"}}'

The operator will automatically run vgreduce --removemissing and delete the missing physical volume from the volume group.

When done, set the policy back to its original value:

kubectl patch volumegroup <vg_name> -n stackland-volumes --type=merge \
  -p '{"spec":{"deviceLossSynchronizationPolicy":"Fail"}}'

Step 4: Replace the physical diskStep 4: Replace the physical disk

Physically replace the disk in the server as per the manufacturer documentation.

Step 5: Make sure the new disk has been detectedStep 5: Make sure the new disk has been detected

Once the disk is replaced, topovgm-operator will automatically detect the new disk during the next reconciliation cycle.

If auto-detection is on (physicalVolumeSelector not set), the new disk will be added to the volume group automatically.

If an explicit selector is used (physicalVolumeSelector with specific paths), update the VolumeGroup resource by specifying a path to the new disk:

kubectl edit volumegroup <vg_name> -n stackland-volumes

Step 6: Check the recoveryStep 6: Check the recovery

Make sure that the volume group is synchronized:

kubectl get volumegroup <vg_name> -n stackland-volumes -o jsonpath='{.status.conditions[0]}'

The VolumeGroupSyncedOnNode condition should be set to True.

Check the volume group's status in LVM:

kubectl -n stackland-volumes exec -it <topovgm_pod_name> -- vgs

Was the article helpful?

Previous
Disable
Next
Upgrading a cluster
© 2026 Direct Cursus Technology L.L.C.