Replacing a disk in a RAID array
If there is a RAID array disk failure on a BareMetal server, you must remove the defective disk from the array, request support to replace the physical drive on the server, and then add the new disk to the RAID array.
Note
This guide does not apply to disk failures in RAID 0
arrays. Such arrays are not fault-tolerant, so if one of the disks fails, all the array data will be lost and the array will have to be completely rebuilt.
Remove the defective disk from the RAID array
-
Connect to the server over SSH:
ssh root@<server_public_IP_address>
You can also connect to the server through the KVM console using your username and password.
-
Get information about the RAID array's current disk and partition layout:
cat /proc/mdstat
Result:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sdb3[1] sda3[0] 6287360 blocks super 1.2 [2/2] [UU] md3 : active raid1 sdb4[1] sda4[0] 849215488 blocks super 1.2 [2/2] [UU] bitmap: 4/7 pages [16KB], 65536KB chunk md1 : active raid1 sdb2[1] sda2[0] 10477568 blocks super 1.2 [2/2] [UU]
The above example shows a RAID array consisting of three partitions:
md1
(disk partitionssdb2
andsda2
),md2
(disk partitionssdb3
andsda3
), andmd3
(disk partitionssdb4
andsda4
). -
Get information about the roles of the RAID array partitions:
lsblk
Result:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 838.4G 0 disk ├─sda1 8:1 0 299M 0 part ├─sda2 8:2 0 10G 0 part │ └─md1 9:1 0 10G 0 raid1 /boot ├─sda3 8:3 0 6G 0 part │ └─md2 9:2 0 6G 0 raid1 [SWAP] └─sda4 8:4 0 810G 0 part └─md3 9:3 0 809.9G 0 raid1 / sdb 8:16 0 838.4G 0 disk ├─sdb1 8:17 0 299M 0 part ├─sdb2 8:18 0 10G 0 part │ └─md1 9:1 0 10G 0 raid1 /boot ├─sdb3 8:19 0 6G 0 part │ └─md2 9:2 0 6G 0 raid1 [SWAP] └─sdb4 8:20 0 810G 0 part └─md3 9:3 0 809.9G 0 raid1 /
In the above example:
md1
:/boot
partition.md2
:SWAP
partition.md3
:/
partition with the root file system.
-
Let's assume the
/dev/sdb
disk is down. Remove the/dev/sdb
disk's partitions from the RAID array's partitions:mdadm /dev/md1 --remove /dev/sdb2 mdadm /dev/md2 --remove /dev/sdb3 mdadm /dev/md3 --remove /dev/sdb4
The
mdadm
utility will not allow you to remove a disk from a RAID array if it considers it operational or if the action can cause the array failure. In which case you will be notified that the device is busy:mdadm: hot remove failed for /dev/sdb2: Device or resource busy
If this is the case, tentatively mark the disk as defective and retry the removal:
mdadm /dev/md1 --fail /dev/sdb2 mdadm /dev/md1 --remove /dev/sdb2 mdadm /dev/md2 --fail /dev/sdb3 mdadm /dev/md2 --remove /dev/sdb3 mdadm /dev/md3 --fail /dev/sdb4 mdadm /dev/md3 --remove /dev/sdb4
-
Get the defective disk's ID:
fdisk -l
Result:
... Disk /dev/sdb: 838.36 GiB, 900185481216 bytes, 1758174768 sectors Disk model: SAMSUNG MZ7GE900 Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: CD2ACB4C-1618-4BAF-A6BB-D2B9******** ...
Save the defective disk's ID (
Disk identifier
): you will need it to report the problem to tech support.
Request physical replacement of the disk
Create a disk replacement request to support stating the BareMetal server's and defective disk's IDs.
Wait for the data center engineers to replace the defective disk.
Add the new disk to your RAID array
Once the physical drive is replaced on the server, you must partition the drive and add it to the existing RAID array.
-
Use the
gdisk
utility to set the partition table type:GPT
orMBR
. Installgdisk
for your server's OS if needed.Run the command stating the ID of the RAID array's remaining operational disk:
gdisk -l /dev/sda
Depending on the partition table type, the result will be as follows:
GPTMBRPartition table scan: MBR: protective BSD: not present APM: not present GPT: present ...
Partition table scan: MBR: MBR only BSD: not present APM: not present GPT: not present ...
-
Copy the partition table layout from the RAID array's remaining operational disk to the new disk:
GPTMBRIf the source disk uses a GPT partition table:
-
Create a copy of the source disk partition table:
sgdisk --backup=table /dev/sda
Result:
The operation has completed successfully.
-
Recover the partition table from the copy to the new disk:
sgdisk --load-backup=table /dev/sdb
Result:
The operation has completed successfully.
-
Assign a new random UUID to the new disk:
sgdisk -G /dev/sdb
Result:
The operation has completed successfully.
If the source disk uses a MBR partition table:
-
Copy the partition table:
sfdisk -d /dev/sda | sfdisk /dev/sdb
Where:
/dev/sda
: RAID array's remaining source disk to copy the partition table from./dev/sdb
: Target (new) disk to copy the partition table to from the source disk.
-
If the partitions are not displayed after copying, re-read the partition table:
sfdisk -R /dev/sdb
-
-
Add the disk to the RAID array by adding the corresponding disk partitions to the RAID partitions one by one. The mapping between these partitions was done earlier in Remove the defective disk from the RAID array.
Run the following commands:
mdadm /dev/md1 --add /dev/sdb2 mdadm /dev/md2 --add /dev/sdb3 mdadm /dev/md3 --add /dev/sdb4
Once a disk is added to the array, synchronization begins, its speed depending on disk size and type (
ssd
/hdd
).Result:
mdadm: added /dev/sdb2 mdadm: added /dev/sdb3 mdadm: added /dev/sdb4
-
Make sure the new disk is added to the RAID array:
cat /proc/mdstat
Result:
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 sdb3[2] sda3[0] 6287360 blocks super 1.2 [2/2] [UU] md3 : active raid1 sdb4[2] sda4[0] 849215488 blocks super 1.2 [2/2] [UU] bitmap: 4/7 pages [16KB], 65536KB chunk md1 : active raid1 sdb2[2] sda2[0] 10477568 blocks super 1.2 [2/2] [UU] unused devices: <none>
-
Install the Linux OS bootloader on the new disk:
grub-install /dev/sdb
Result:
Installing for i386-pc platform. Installation finished. No error reported.