High-performance computing (HPC) on preemptible VMs
HPC clusters
Follow these instructions to create a cluster of preemptible VMs to perform a shared computational task. For example, to solve a system of linear equations using the Jacobi method
To create a cluster and run a computational task:
- Before you start.
- Create a master VM in the cloud.
- Prepare a VM cluster.
- Create a cluster.
- Create a task for computations in the cluster.
- Run and analyze the computations.
- How to delete created resources.
Prepare your cloud
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The cost for hosting servers includes:
- A fee for running multiple VMs (see Yandex Compute Cloud pricing).
- A fee for using a dynamic or a static public IP (see Yandex Virtual Private Cloud pricing).
Create a master VM in the cloud
Create a VM
To create a VM:
-
On the management console
folder page, click Create resource and select Virtual machine. -
In the Name field, enter a name for the VM. For clarity, enter
master-node
. -
Select an availability zone to put your virtual machine in.
-
Under Image/boot disk selection, click the Cloud Marketplace tab and select Ubuntu as your image.
-
Under Disks, select an SSD with 13 GB. The disk type should be SSD since other VMs will use it for network access.
-
Under Computing resources:
- Select the VM's platform.
- For these computational tasks, specify the following configuration:
- Platform: Intel Ice Lake.
- Guaranteed vCPU share: 100%.
- vCPU: 4.
- RAM: 4 GB.
- Advanced: Preemptible.
-
Under Network settings:
- Select the Network and Subnet to connect the VM to. If you don't have a network or subnet, create them right on the VM creation page.
- In the Public address field, leave the Auto value to assign the virtual machine a random external IP address from the Yandex Cloud pool, or select a static address from the list if you reserved one in advance.
-
Under Access, specify the information required to access the VM:
- In the Login field, enter a username to be created on the VM.
- In the SSH key field, paste your public SSH key. You need to create a key pair for the SSH connection yourself. See the section about how to connect to VMs via SSH.
-
Click Create VM.
Configure the VM
-
Connect over SSH to the virtual machine and enter administrator mode in the console:
sudo -i
-
Update the repository and install the required utilities:
apt update apt install -y net-tools htop libopenmpi-dev nfs-common
-
Exit admin mode and generate SSH keys for access between the VMs:
exit ssh-keygen -t ed25519
-
Add the generated key to the list of allowed ones:
cd ~/.ssh cat id_ed25519.pub >> authorized_keys
Prepare a VM cluster
Create a cluster
- In the management console
, go to Disks and click Create snapshot of themaster-node
disk. Name itmaster-node-snapshot
. After the snapshot is created, it appears in the list under Disk snapshots. - Go to Instance groups and click Create group.
- Create an instance group:
- In the Name field, enter a name for the future group (for example,
compute-group
). - In the Service account field, add a service account. If you don't have a service account, click Create new, enter a name, and click Create.
- Choose the same Availability zone that the
master-node
is in. VMs should be in the same availability zone to reduce latency between them. - Under Instance template, click Define. This opens a screen for creating a template.
-
Under Disks, select Add disk. In the window that opens, specify:
- Disk designation: Boot.
- Disk type: SSD.
- Contents: From
master-node-snapshot
.
-
Under Computing resources, specify the same configuration as the master VM:
- Platform: Intel Ice Lake.
- Guaranteed vCPU share: 100%.
- vCPU: 4.
- RAM: 4 GB.
- Advanced: Preemptible.
-
Under Network settings, specify the same network and subnet as the master VM. Leave the address type as Auto.
-
Under Access, specify the information required to access the VM:
- In the Login field, enter a username to be created on the VM.
- In the SSH key field, paste your public SSH key. You need to create a key pair for the SSH connection yourself. See the section about how to connect to VMs via SSH.
-
Click Add. This returns you to the instance group creation screen.
-
- In the Name field, enter a name for the future group (for example,
- Under Scalability, select the number of instances to create. Specify 3 instances.
- Click Create.
Test the cluster
Log in via SSH to each VM in compute-group
and make sure you can access master-node
from them via SSH.
ping master-node
ssh master-node
Configure the NFS
To allow the VMs to use the same source files, create a shared network directory using NFS
-
Log in to the
master-node
VM via SSH and install the NFS server:ssh <master-node VM public IP address> sudo apt install nfs-kernel-server
-
Create a
shared
directory for the VMs:mkdir ~/shared
-
Open the
/etc/exports
file with a text editor, likenano
:sudo nano /etc/exports
-
Add an entry to the file to enable access to the
shared
directory:/home/<username>/shared *(rw,sync,no_root_squash,no_subtree_check)
Save the file.
-
Apply the settings and restart the service:
sudo exportfs -a sudo service nfs-kernel-server restart
Mount directories on group VMs
On each VM in the compute-group
, mount the directory you created:
-
Create a
shared
directory and mount the directory with themaster-node
VM on it:mkdir ~/shared sudo mount -t nfs master-node:/home/<username>/shared ~/shared
-
Make sure that the directory is successfully mounted:
df -h Filesystem Size Used Avail Use% Mounted on ... master-node:/home/<username>/shared 13G 1.8G 11G 15% /home/<username>/shared
Create a task for computations in the cluster
-
Log in to the
master-node
VM via SSH, go to theshared
directory, and download thetask.c
source file with a computational task:cd ~/shared wget https://raw.githubusercontent.com/cloud-docs-writer/examples/master/hpc-on-preemptible/task.c
This code solves a system of linear equations using the Jacobi method
. The task has one distributed implementation using MPI. -
Compile the source file into an executable file:
mpicc task.c -o task
As a result, the
task
executable file should appear in theshared
directory.
Run and analyze the computations
Tip
You can check the load on VM cores by running the htop
command in a separate SSH session on each VM.
-
Run the task on two cores using only
master-node
resources:mpirun -np 2 task
When the task is completed, the program displays the time spent performing it:
JAC1 STARTED 1: Time of task=45.104153 0: Time of task=45.103931
-
Run the task on four cores using only
master-node
resources and get the appropriate results:mpirun -np 4 task JAC1 STARTED 1: Time of task=36.562328 2: Time of task=36.562291 3: Time of task=36.561989 0: Time of task=36.561695
-
Run the task on four cores using the resources of two VMs with two cores per VM. To do this, run the task with the
-host
key that accepts parameters like<VM IP address>:<number of cores>[,<ip>:<cores>[,...]]
. After computing the task, the program displays the results:mpirun -np 4 -host localhost:2,<VM’s IP address>:2 task JAC1 STARTED 0: Time of task=24.539981 1: Time of task=24.540288 3: Time of task=24.540619 2: Time of task=24.540781
-
Similarly, you can continue to increase the number of VMs and cores in use and see how distributed computing can significantly speed up task execution.
How to delete created resources
To stop paying for your deployed server and created instance group, just delete the master-node
VM and compute-group
group.
If you reserved a static public IP address specifically for this VM:
- Select VPC in your folder.
- Go to the IP addresses tab.
- Find the required address, click
, and select Delete.