High-performance computing (HPC) on preemptible VMs
HPC clusters
Follow this tutorial to create a cluster of preemptible VMs for performing a shared computational task. For example, you can solve a system of linear equations using the Jacobi method
To create a cluster and run a computational task:
- Get your cloud ready.
- Create a master VM in the cloud.
- Prepare the VM cluster.
- Create a cluster.
- Create a task for computations in the cluster.
- Run and analyze the computations.
- Delete the resources you created.
Get your cloud ready
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The costs for hosting servers include:
- Fee for multiple continuously running VMs (see Yandex Compute Cloud pricing).
- Fee for using a dynamic or static public IP address (see Yandex Virtual Private Cloud pricing).
Create a master VM in the cloud
Create a VM
To create a VM:
-
In the management console
, select the folder to create your VM in. -
In the list of services, select Compute Cloud.
-
In the left-hand panel, select
Virtual machines. -
Click Create virtual machine.
-
Under Boot disk image, select the Ubuntu image.
-
Under Location, select an availability zone the VM will reside in.
-
Under Disks and file storages, select
SSD
as the boot disk type. -
Under Computing resources, go to the Custom tab and specify parameters for your current computing tasks:
- Platform:
Intel Ice Lake
- vCPU:
4
- Guaranteed vCPU performance:
100%
- RAM:
4 GB
- Additional:
Preemptible
- Platform:
-
Under Network settings:
-
In the Subnet field, enter the ID of a subnet in the new VM’s availability zone. Alternatively, you can select a cloud network from the list.
-
Each network must have at least one subnet. If there is no subnet, create one by selecting Create subnet.
-
If you do not have a network, click Create network to create one:
- In the window that opens, enter the network name and select the folder to host the network.
- Optionally, enable the Create subnets setting to automatically create subnets in all availability zones.
- Click Create network.
-
-
In the Public IP address field, select
Auto
to assign the VM a random external IP address from the Yandex Cloud pool. Alternatively, select a static address from the list if you reserved one.
-
-
Under Access, select SSH key and specify the VM access credentials:
-
In the Login field, enter a name for the user you want to create on the VM, e.g.,
ubuntu
.Alert
Do not use
root
or other reserved usernames. To perform operations requiring root privileges, use thesudo
command. -
In the SSH key field, select the SSH key saved in your organization user profile.
If there are no saved SSH keys in your profile, or you want to add a new key:
- Click Add key.
- Enter a name for the SSH key.
- Upload or paste the contents of the public key file. You need to create a key pair for the SSH connection to a VM yourself.
- Click Add.
The SSH key will be added to your organization user profile.
If users cannot add SSH keys to their profiles in the organization, the added public SSH key will only be saved to the user profile of the VM being created.
-
-
Under General information, specify the VM name. For clarity, enter
master-node
. -
Click Create VM.
Set up the VM
-
Use SSH to connect to the VM and switch to administrator mode in the console:
sudo -i
-
Update the repository and install the required utilities:
apt update apt install -y net-tools htop libopenmpi-dev nfs-common
-
Exit admin mode and generate SSH keys for access between the VMs:
exit ssh-keygen -t ed25519
-
Add the key you generated to the list of allowed ones:
cd ~/.ssh cat id_ed25519.pub >> authorized_keys
Prepare the VM cluster
Create a cluster
- In the management console
, go to Disks. - To the right of the
master-node
VM disk, click and select Create snapshot. Enter the name:master-node-snapshot
. After you create the snapshot, it will appear in the list under Snapshots. - Go to Instance groups and click Create group of virtual machines.
- Create an instance group:
-
In the Name field, enter a name for your instance group, e.g.,
compute-group
. -
In the Service account field, add a service account to the instance group. If you do not have a service account, click Create new account, enter a name, and click Create.
To create, update, and delete VMs in the group, assign the compute.editor role to the service account. By default, all operations in Instance Groups are performed on behalf of a service account.
-
In the Availability zone field, select the availability zone where the
master-node
VM resides. Make sure the VMs are in the same availability zone to reduce latency between them. -
Under Instance template, click Define. This will open a screen for creating a template.
- Under Disks and file storages, select Add disk. In the window that opens, specify:
- Under Computing resources, reproduce the the master VM configuration:
- Platform:
Intel Ice Lake
- vCPU:
4
- Guaranteed vCPU performance:
100%
- RAM:
4 GB
- Additional:
Preemptible
- Platform:
- Under Network settings, specify the same network and subnet as those of the master VM. Leave Auto as the IP address type.
- Under Access, specify the information required to access the VM:
- In the Login field, enter your preferred login for the user you will create on the VM.
- Paste your public SSH key into the SSH key field. You will need to create a key pair for the SSH connection on your own. To learn more, see Connecting to a VM over SSH.
- Click Save. This will take you back to the instance group creation screen.
-
- Under Scaling, select the number of instances to create. Specify three instances.
- Click Create.
Test the cluster
Log in over SSH to each VM in compute-group
and make sure you can access the master-node
VM from them over SSH:
ping master-node
ssh master-node
Configure the NFS
To allow the VMs to use the same source files, create a shared network directory using NFS
-
Log in to the
master-node
VM over SSH and install an NFS server:ssh <master-node VM public IP address> sudo apt install nfs-kernel-server
-
Create a
shared
directory for the VMs:mkdir ~/shared
-
Open the
/etc/exports
file in any text editor, e.g.,nano
:sudo nano /etc/exports
-
Add an entry to the file to enable access to the
shared
directory:/home/<username>/shared *(rw,sync,no_root_squash,no_subtree_check)
Save the file.
-
Apply the settings and restart the service:
sudo exportfs -a sudo service nfs-kernel-server restart
Mount the directories on the group VMs
On each VM in compute-group
, mount the directory you created:
-
Create a
shared
directory and mount the directory with themaster-node
VM on it:mkdir ~/shared sudo mount -t nfs master-node:/home/<username>/shared ~/shared
-
Make sure the directory is successfully mounted:
df -h
Result:
Filesystem Size Used Avail Use% Mounted on ... master-node:/home/<username>/shared 13G 1.8G 11G 15% /home/<username>/shared
Create a computing task in the cluster
-
Log in to the
master-node
VM over SSH, go to theshared
directory, and download thetask.c
source file with a computing task:cd ~/shared wget https://raw.githubusercontent.com/cloud-docs-writer/examples/master/hpc-on-preemptible/task.c
This code solves a system of linear equations using the Jacobi method. This task has one distributed implementation using MPI.
-
Compile the source file into an executable:
mpicc task.c -o task
As a result, the
task
executable file should appear in theshared
directory.
Run and analyze the computations
Tip
You can check the load on the VM cores by running the htop
command in a separate SSH session on each VM.
-
Run the task on two cores using only the
master-node
VM resources:mpirun -np 2 task
Once the task has been completed, the program will display the time spent performing it:
JAC1 STARTED 1: Time of task=45.104153 0: Time of task=45.103931
-
Run the task on four cores using only the
master-node
VM resources:mpirun -np 4 task
Result:
JAC1 STARTED 1: Time of task=36.562328 2: Time of task=36.562291 3: Time of task=36.561989 0: Time of task=36.561695
-
Run the task on four cores using the resources of two VMs with two cores per VM. To do this, run the task with the
-host
key that accepts parameters in<VM IP address>:<number of cores>[,<ip>:<cores>[,...]]
format:mpirun -np 4 -host localhost:2,<VM IP address>:2 task
Result:
JAC1 STARTED 0: Time of task=24.539981 1: Time of task=24.540288 3: Time of task=24.540619 2: Time of task=24.540781
-
Similarly, you can further increase the number of VMs and cores in use and see how distributed computing can significantly speed up the task resolution.
Delete the resources you created
To stop paying for the deployed server and VM group you created, delete the master-node
VM and compute-group
.
If you reserved a static public IP address specifically for this VM:
- Select Virtual Private Cloud in your folder.
- Go to the IP addresses tab.
- Find the required IP address, click
, and select Delete.