Deploying the GlusterFS parallel file system in high performance mode
GlusterFS
Use this tutorial to create an infrastructure made up of 30 segments sharing a common GlusterFS file system. Placing storage disks in a single availability zone will ensure high performance of your file system. In this use case, performance is limited by the speed of accessing physical disks, while network latency is less important.
To configure a high-performance file system:
- Prepare your cloud.
- Configure the CLI profile.
- Prepare an environment for deploying the resources.
- Deploy your resources.
- Install and configure GlusterFS.
- Test the solution availability.
- Test the solution performance.
If you no longer need the resources you created, delete them.
Prepare your cloud
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Required paid resources
The infrastructure support costs include:
- Fee for continuously running VMs and disks (see Yandex Compute Cloud pricing).
- Fee for using public IP addresses and outgoing traffic (see Yandex Virtual Private Cloud pricing).
Configure the CLI profile
-
If you do not have the Yandex Cloud command line interface yet, install it and sign in as a user.
-
Create a service account:
Management consoleCLIAPI- In the management console
, select the folder where you want to create a service account. - In the list of services, select Identity and Access Management.
- Click Create service account.
- Enter a name for the service account, e.g.,
sa-glusterfs
. - Click Create.
The folder specified in the CLI profile is used by default. You can specify a different folder using the
--folder-name
or--folder-id
parameter.Run the command below to create a service account, specifying
sa-glusterfs
as its name:yc iam service-account create --name sa-glusterfs
Where
name
is the service account name.Result:
id: ajehr0to1g8b******** folder_id: b1gv87ssvu49******** created_at: "2023-06-20T09:03:11.665153755Z" name: sa-glusterfs
To create a service account, use the ServiceAccountService/Create gRPC API call or the create REST API method for the
ServiceAccount
resource. - In the management console
-
Assign the service account the administrator role for the folder:
Management consoleCLIAPI- On the management console home page
, select a folder. - Go to the Access bindings tab.
- Find the
sa-glusterfs
account in the list and click . - Click Edit roles.
- Click Add role in the dialog that opens and select the
admin
role.
Run this command:
yc resource-manager folder add-access-binding <folder_ID> \ --role admin \ --subject serviceAccount:<service_account_ID>
To assign the service account a role for the folder, use the setAccessBindings REST API method for the ServiceAccount resource or the ServiceAccountService/SetAccessBindings gRPC API call.
- On the management console home page
-
Set up the CLI profile to run operations on behalf of the service account:
CLI-
Create an authorized key for the service account and save it to the file:
yc iam key create \ --service-account-id <service_account_ID> \ --folder-id <ID_of_folder_with_service_account> \ --output key.json
Where:
service-account-id
: Service account ID.folder-id
: ID of the folder in which the service account was created.output
: Name of the file with the authorized key.
Result:
id: aje8nn871qo4******** service_account_id: ajehr0to1g8b******** created_at: "2023-06-20T09:16:43.479156798Z" key_algorithm: RSA_2048
-
Create a CLI profile to run operations on behalf of the service account:
yc config profile create sa-glusterfs
Result:
Profile 'sa-glusterfs' created and activated
-
Set the profile configuration:
yc config set service-account-key key.json yc config set cloud-id <cloud_ID> yc config set folder-id <folder_ID>
Where:
-
Add the credentials to the environment variables:
export YC_TOKEN=$(yc iam create-token) export YC_CLOUD_ID=$(yc config get cloud-id) export YC_FOLDER_ID=$(yc config get folder-id)
-
Prepare an environment for deploying the resources
-
Create an SSH key pair:
ssh-keygen -t ed25519
We recommend leaving the key file name unchanged.
-
Clone the
yandex-cloud-examples/yc-distributed-ha-storage-with-glusterfs
GitHub repository and go to theyc-distributed-ha-storage-with-glusterfs
folder:git clone https://github.com/yandex-cloud-examples/yc-distributed-ha-storage-with-glusterfs.git cd ./yc-distributed-ha-storage-with-glusterfs
-
Edit the
variables.tf
file, specifying the parameters of the resources you are deploying:Warning
The values set in the file result in deploying a resource-intensive infrastructure.
To deploy the resources within your available quotas, use the values below or change the values according to your specific needs.-
In the
is_ha
section, changedefault
tofalse
. -
In
client_node_per_zone
, changedefault
to30
. -
In
storage_node_per_zone
, changedefault
to30
.Note
In this use case, we will deploy 30 VMs. You can change this number depending on the requirements for the final storage size or total bandwidth.
The maximum aggregate bandwidth of the entire system is calculated as each segment's bandwidth (450 MB/s for network SSDs) multiplied by the number of segments (30), which is around 13.5 GB/s.
The system capacity is calculated as the number of segments (30) multiplied by the size of each storage (1 TB), which amounts to 30 TB. -
If you specified a non-default name when creating the SSH key pair, under
local_pubkey_path
, changedefault
to<path_to_public_SSH_key>
. -
If you need enhanced performance without guaranteed data durability, you can use non-replicated SSDs. For this, in
disk_type
, changedefault
tonetwork-ssd-nonreplicated
. In addition, make sure thedefault
value underdisk_size
is a multiple of 93.
-
Deploy your resources
- Initialize Terraform:
terraform init
- Check the Terraform file configuration:
terraform validate
- Check the list of cloud resources you are about to create:
terraform plan
- Create resources:
terraform apply -auto-approve
- Wait until a process completion message appears:
Outputs: connect_line = "ssh storage@158.160.108.137" public_ip = "158.160.108.137"
This will create 30 VMs for hosting client code (client01
, client02
, etc.) in the folder and 30 VMs for distributed data storage (gluster01
, gluster02
, etc.) bound to the client VMs and placed in the same availability zone.
Install and configure GlusterFS
-
Connect to the
client01
VM using the command from the process completion output:ssh storage@158.160.108.137
-
Switch to the
root
superuser mode:sudo -i
-
Install ClusterShell
:dnf install epel-release -y dnf install clustershell -y echo 'ssh_options: -oStrictHostKeyChecking=no' >> /etc/clustershell/clush.conf
-
Create the configuration files:
cat > /etc/clustershell/groups.conf <<EOF [Main] default: cluster confdir: /etc/clustershell/groups.conf.d $CFGDIR/groups.conf.d autodir: /etc/clustershell/groups.d $CFGDIR/groups.d EOF cat > /etc/clustershell/groups.d/cluster.yaml <<EOF cluster: all: '@clients,@gluster' clients: 'client[01-30]' gluster: 'gluster[01-30]' EOF
-
Install GlusterFS:
clush -w @all hostname # check and auto add fingerprints clush -w @all dnf install centos-release-gluster -y clush -w @all dnf --enablerepo=powertools install glusterfs-server -y clush -w @gluster mkfs.xfs -f -i size=512 /dev/vdb clush -w @gluster mkdir -p /bricks/brick1 clush -w @gluster "echo '/dev/vdb /bricks/brick1 xfs defaults 1 2' >> /etc/fstab" clush -w @gluster "mount -a && mount"
-
Restart GlusterFS:
clush -w @gluster systemctl enable glusterd clush -w @gluster systemctl restart glusterd
-
Check the availability of the
gluster02
throughgluster30
VMs:clush -w gluster01 'for i in {2..9}; do gluster peer probe gluster0$i; done' clush -w gluster01 'for i in {10..30}; do gluster peer probe gluster$i; done'
-
Create a
vol0
folder in each data storage VM and configure availability and fault tolerance by connecting to thestripe-volume
shared folder:clush -w @gluster mkdir -p /bricks/brick1/vol0 export STRIPE_NODES=$(nodeset -S':/bricks/brick1/vol0 ' -e @gluster) clush -w gluster01 gluster volume create stripe-volume ${STRIPE_NODES}:/bricks/brick1/vol0
-
Configure additional performance settings:
clush -w gluster01 gluster volume set stripe-volume client.event-threads 8 clush -w gluster01 gluster volume set stripe-volume server.event-threads 8 clush -w gluster01 gluster volume set stripe-volume cluster.shd-max-threads 8 clush -w gluster01 gluster volume set stripe-volume performance.read-ahead-page-count 16 clush -w gluster01 gluster volume set stripe-volume performance.client-io-threads on clush -w gluster01 gluster volume set stripe-volume performance.quick-read off clush -w gluster01 gluster volume set stripe-volume performance.parallel-readdir on clush -w gluster01 gluster volume set stripe-volume performance.io-thread-count 32 clush -w gluster01 gluster volume set stripe-volume performance.cache-size 1GB clush -w gluster01 gluster volume set stripe-volume performance.cache-invalidation on clush -w gluster01 gluster volume set stripe-volume performance.md-cache-timeout 600 clush -w gluster01 gluster volume set stripe-volume performance.stat-prefetch on clush -w gluster01 gluster volume set stripe-volume server.allow-insecure on clush -w gluster01 gluster volume set stripe-volume network.inode-lru-limit 200000 clush -w gluster01 gluster volume set stripe-volume features.shard-block-size 128MB clush -w gluster01 gluster volume set stripe-volume features.shard on clush -w gluster01 gluster volume set stripe-volume features.cache-invalidation-timeout 600 clush -w gluster01 gluster volume set stripe-volume storage.fips-mode-rchecksum on
-
Mount the
stripe-volume
shared folder on the client VMs:clush -w gluster01 gluster volume start stripe-volume clush -w @clients mount -t glusterfs gluster01:/stripe-volume /mnt/
Test the solution availability
-
Check the status of the
stripe-volume
shared folder:clush -w gluster01 gluster volume status
-
Create a text file:
cat > /mnt/test.txt <<EOF Hello, GlusterFS! EOF
-
Make sure the file is available on all client VMs:
clush -w @clients sha256sum /mnt/test.txt
Result:
client01: 5fd9c031531c39f2568a8af5512803fad053baf3fe9eef2a03ed2a6f0a884c85 /mnt/test.txt client02: 5fd9c031531c39f2568a8af5512803fad053baf3fe9eef2a03ed2a6f0a884c85 /mnt/test.txt client03: 5fd9c031531c39f2568a8af5512803fad053baf3fe9eef2a03ed2a6f0a884c85 /mnt/test.txt ... client30: 5fd9c031531c39f2568a8af5512803fad053baf3fe9eef2a03ed2a6f0a884c85 /mnt/test.txt
Test the solution performance
IOR
-
Install the dependencies:
clush -w @clients dnf install -y autoconf automake pkg-config m4 libtool git mpich mpich-devel make fio cd /mnt/ git clone https://github.com/hpc/ior.git cd ior mkdir prefix
-
Close the shell and open it again:
^C sudo -i module load mpi/mpich-x86_64 cd /mnt/ior
-
Install IOR:
./bootstrap ./configure --disable-dependency-tracking --prefix /mnt/ior/prefix make make install mkdir -p /mnt/benchmark/ior
-
Run IOR:
export NODES=$(nodeset -S',' -e @clients) mpirun -hosts $NODES -ppn 16 /mnt/ior/prefix/bin/ior -o /mnt/benchmark/ior/ior_file -t 1m -b 16m -s 16 -F mpirun -hosts $NODES -ppn 16 /mnt/ior/prefix/bin/ior -o /mnt/benchmark/ior/ior_file -t 1m -b 16m -s 16 -F -C
Result:
IOR-4.1.0+dev: MPI Coordinated Test of Parallel I/O Options: api : POSIX apiVersion : test filename : /mnt/benchmark/ior/ior_file access : file-per-process type : independent segments : 16 ordering in a file : sequential ordering inter file : no tasks offsets nodes : 30 tasks : 480 clients per node : 16 memoryBuffer : CPU dataAccess : CPU GPUDirect : 0 repetitions : 1 xfersize : 1 MiB blocksize : 16 MiB aggregate filesize : 120 GiB Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter ------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ---- write 1223.48 1223.99 4.65 16384 1024.00 2.44 100.39 88.37 100.44 0 read 1175.45 1175.65 4.83 16384 1024.00 0.643641 104.52 37.97 104.54 0
How to delete the resources you created
To stop paying for the resources created, delete them:
terraform destroy -auto-approve