High-performance computing (HPC) on preemptible VMs
HPC clusters
Follow this guide to create a cluster of preemptible VMs to perform a shared computational task. For example, to solve a system of linear equations using the Jacobi method
To create a cluster and run a computational task:
- Prepare your cloud.
- Create a master VM in the cloud.
- Prepare the VM's cluster.
- Create a cluster.
- Create a task for computations in the cluster.
- Run and analyze the computations.
- Delete the resources you created.
Prepare your cloud
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The cost for hosting servers includes:
- Fee for multiple continuously running VMs (see Yandex Compute Cloud pricing).
- Fee for using a dynamic or static public IP address (see Yandex Virtual Private Cloud pricing).
Create a master VM in the cloud
Create a VM
To create a VM:
-
In the management console
, select the folder to create your VM in. -
In the list of services, select Compute Cloud.
-
In the left-hand panel, select
Virtual machines. -
Click Create virtual machine.
-
Under Boot disk image, select the Ubuntu image.
-
Under Location, select an availability zone to place your VM in.
-
Under Disks and file storages, select the type of the
SSD
boot disk. -
Under Computing resources, navigate to the Custom tab and specify parameters for solving current computational problems:
- Platform:
Intel Ice Lake
. - vCPU:
4
. - Guaranteed vCPU performance:
100%
. - RAM:
4 GB
. - Additional:
Preemptible
.
- Platform:
-
Under Network settings:
-
In the Subnet field, specify the subnet ID in the availability zone of the VM you are creating or select a cloud network from the list.
-
Each network must have at least one subnet. If there is no subnet, create one by selecting Create subnet.
-
If you do not have a network, click Create network to create one:
- In the window that opens, specify the network name and select the folder to create it in.
- (Optional) Select the Create subnets option to automatically create subnets in all availability zones.
- Click Create network.
-
-
In the Public IP field, select
Auto
to assign the VM a random external IP address from the Yandex Cloud pool or select a static address from the list if you reserved one in advance.
-
-
Under Access, specify the information required to access the VM:
-
In the Login field, enter the name of the user to create on the VM, e.g.,
ubuntu
.Alert
Do not use
root
or other usernames reserved by the operating system. To perform operations requiring superuser permissions, use thesudo
command. -
In the SSH key field, paste the contents of the public key file.
You need to create a key pair for the SSH connection yourself. To learn how, see Connecting to a VM via SSH.
-
-
Under General information, specify the VM name. For clarity, enter
master-node
. -
Click Create VM.
Set up the VM
-
Use SSH to connect to the VM and change to administrator mode in the console:
sudo -i
-
Update the repository and install the required utilities:
apt update apt install -y net-tools htop libopenmpi-dev nfs-common
-
Exit admin mode and generate SSH keys for access between the VMs:
exit ssh-keygen -t ed25519
-
Add the generated key to the list of allowed ones:
cd ~/.ssh cat id_ed25519.pub >> authorized_keys
Prepare the VM's cluster
Create a cluster
- In the management console
, go to Disks. - To the right of the
master-node
VM disk, click and select Create snapshot. Enter the name:master-node-snapshot
. After the snapshot is created, it appears in the list under Snapshots. - Go to Instance groups and click Create group of virtual machines.
- Create a VM group:
- In the Name field, enter a name for the future VM group, e.g.,
compute-group
. - In the Service account field, add a service account to the instance group. If you do not have a service account, click Create new account, enter a name, and click Create.
- In the Availability zone field, select the availability zone the
master-node
VM is in. Make sure the VMs are in the same availability zone to reduce latency between them. - Under Instance template, click Define. This opens a screen for creating a template.
- Under Disks, select Add disk. In the window that opens, specify:
- Under Computing resources, specify the same configuration as that of the master VM:
- Platform:
Intel Ice Lake
. - vCPU:
4
. - Guaranteed vCPU performance:
100%
. - RAM:
4 GB
. - Additional:
Preemptible
.
- Platform:
- Under Network settings, specify the same network and subnet as those of the master VM. Leave Auto for the IP address type.
- Under Access, specify the information required to access the VM:
- In the Login field, enter your preferred login for the user to create on the VM.
- In the SSH key field, paste your public SSH key. You need to create a key pair for the SSH connection yourself. To learn how, see Connecting to a VM via SSH.
- Click Save. This returns you to the instance group creation screen.
- In the Name field, enter a name for the future VM group, e.g.,
- Under Scaling, select the number of instances to be created. Specify 3 instances.
- Click Create.
Test the cluster
Log in via SSH to each VM in compute-group
and make sure you can access themaster-node
VM from them via SSH:
ping master-node
ssh master-node
Configure the NFS
To allow the VMs to use the same source files, create a shared network directory using NFS
-
Log in to the
master-node
VM via SSH and install the NFS server:ssh <master-node VM public IP address> sudo apt install nfs-kernel-server
-
Create a
shared
directory for the VMs:mkdir ~/shared
-
Open the
/etc/exports
file using any text editor, e.g.,nano
:sudo nano /etc/exports
-
Add an entry to the file to enable access to the
shared
directory:/home/<username>/shared *(rw,sync,no_root_squash,no_subtree_check)
Save the file.
-
Apply the settings and restart the service:
sudo exportfs -a sudo service nfs-kernel-server restart
Mount directories on group VMs
On each VM in the compute-group
, mount the directory you created:
-
Create a
shared
directory and mount the directory with themaster-node
VM on it:mkdir ~/shared sudo mount -t nfs master-node:/home/<username>/shared ~/shared
-
Make sure that the directory is successfully mounted:
df -h
Result:
Filesystem Size Used Avail Use% Mounted on ... master-node:/home/<username>/shared 13G 1.8G 11G 15% /home/<username>/shared
Create a task for computations in the cluster
-
Log in to the
master-node
VM via SSH, go to theshared
directory, and download thetask.c
source file with a computational task:cd ~/shared wget https://raw.githubusercontent.com/cloud-docs-writer/examples/master/hpc-on-preemptible/task.c
This code solves a system of linear equations using the Jacobi method. The task has one distributed implementation using MPI.
-
Compile the source file into an executable file:
mpicc task.c -o task
As a result, the
task
executable file should appear in theshared
directory.
Run and analyze the computations
Tip
You can check the load on VM cores by running the htop
command in a separate SSH session on each VM.
-
Run the task on two cores using only the
master-node
VM resources:mpirun -np 2 task
When the task is completed, the program displays the time spent performing it:
JAC1 STARTED 1: Time of task=45.104153 0: Time of task=45.103931
-
Run the task on four cores using only the
master-node
VM resources:mpirun -np 4 task
Result:
JAC1 STARTED 1: Time of task=36.562328 2: Time of task=36.562291 3: Time of task=36.561989 0: Time of task=36.561695
-
Run the task on four cores using the resources of two VMs with two cores per VM. To do this, run the task with the
-host
key that accepts parameters like<VM IP address>:<number of cores>[,<ip>:<cores>[,...]]
:mpirun -np 4 -host localhost:2,<VM IP address>:2 task
Result:
JAC1 STARTED 0: Time of task=24.539981 1: Time of task=24.540288 3: Time of task=24.540619 2: Time of task=24.540781
-
Similarly, you can continue to increase the number of VMs and cores in use and see how distributed computing can significantly speed up task execution.
Delete the resources you created
To stop paying for your deployed server and created VM group, just delete the master-node
VM and compute-group
If you reserved a static public IP address specifically for this VM:
- Select Virtual Private Cloud in your folder.
- Go to the IP addresses tab.
- Find the required IP address, click
, and select Delete.