Creating an Greenplum® cluster
A Managed Service for Greenplum® cluster consists of master hosts that accept client queries and segment hosts that provide data processing and storage capability.
Available disk types depend on the selected host class.
For more information, see Resource relationships.
Creating a cluster
To create a Managed Service for Greenplum® cluster, you need the vpc.user role and the managed-greenplum.editor role or higher. For more information on assigning roles, see the Identity and Access Management documentation.
To create a Managed Service for Greenplum® cluster:
-
In the management console
, select the folder where you want to create a database cluster. -
Select Managed Service for Greenplum.
-
Click Create cluster.
-
Enter a name for the cluster. It must be unique within the folder.
-
(Optional) Enter a cluster description.
-
Select the environment where you want to create the cluster (you cannot change the environment once the cluster is created):
PRODUCTION
: For stable versions of your apps.PRESTABLE
: For testing purposes. The prestable environment is similar to the production environment and likewise covered by the SLA, but it is the first to get new functionalities, improvements, and bug fixes. In the prestable environment, you can test compatibility of new versions with your application.
-
Select the Greenplum® version.
-
Optionally, select groups of dedicated hosts to host the cluster.
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
-
Under Network settings:
-
Select the cloud network for the cluster.
-
In the Security groups parameter, specify the security group that contains the rules allowing all incoming and outgoing traffic over any protocol from any IP address.
Alert
For a Managed Service for Greenplum® cluster to work properly, at least one of its security groups must have rules allowing all incoming and outgoing traffic from any IP address.
-
Select the availability zone and subnet for the cluster. To create a new subnet, click Create new next to the availability zone you need.
-
Select Public access to enable connecting to the cluster from the internet.
-
-
(Optional) For clusters with Greenplum® version 6.25 or higher, enable the Hybrid storage option.
It activates the Yezzey
extension from Yandex Cloud. This extension is used to export AO and AOCO tables from disks within the Managed Service for Greenplum® cluster to a cold storage in Object Storage. This way, the data will be stored in a service bucket in a compressed and encrypted form. This is a more cost-efficient storage method.You cannot disable this option after you save your cluster settings.
Note
This feature is at the Preview stage and is free of charge.
-
Specify the admin user settings. This special user is required for managing the cluster and cannot be deleted. For more information, see Users and roles.
-
Username may contain Latin letters, numbers, hyphens, and underscores, but cannot start with a hyphen. It must be from 1 to 32 characters long.
Note
Such names as
admin
,gpadmin
, mdb_admin,mdb_replication
,monitor
,none
,postgres
,public
, andrepl
are reserved for Managed Service for Greenplum®. You cannot create users with these names. -
Password must be from 8 to 128 characters long.
-
-
Configure additional cluster settings, if required:
-
Backup start time (UTC): Time interval during which the cluster backup starts. Time is specified in 24-hour UTC format. The default time is
22:00 - 23:00
UTC. -
Maintenance window: Maintenance window settings:
- To enable maintenance at any time, select arbitrary (default).
- To specify the preferred maintenance start time, select by schedule and specify the desired day of the week and UTC hour. For example, you can choose a time when the cluster is least loaded.
Maintenance operations are carried out both on enabled and disabled clusters. They may include updating the DBMS, applying patches, and so on.
-
DataLens access: Allows you to analyze cluster data in Yandex DataLens.
-
The Yandex Query access option enables you to run YQL queries from Yandex Query to a managed database in Managed Service for Greenplum®.
-
Deletion protection: Manages protection of the cluster, its databases, and users against accidental deletion.
Enabled deletion protection will not prevent a manual connection with the purpose to delete database contents.
-
-
(Optional) Configure the operating mode and connection pooler parameters under Connection pooler:
- Mode:
SESSION
(default) orTRANSACTION
. - Size: Maximum number of client connections.
- Client Idle Timeout: Client idle time (in ms), after which the connection will be terminated.
- Mode:
-
(Optional) Under Managing background processes, edit the parameters of routine maintenance operations:
- Start time (UTC):
VACUUM
start time. The default value is19:00 UTC
. Once theVACUUM
operation is completed, theANALYZE
operation starts. - VACUUM timeout: Maximum
VACUUM
execution time, in seconds. Valid values: from7,200
to86,399
, with36,000
by default. As soon as this period expires,VACUUM
will be forced to terminate. - ANALYZE timeout: Maximum
ANALYZE
execution time, in seconds. Valid values: from7,200
to86,399
, with36,000
by default. As soon as this period expires, theANALYZE
operation will be forced to terminate.
The combined
VACUUM
andANALYZE
execution time may not exceed 24 hours. - Start time (UTC):
-
Specify the master host parameters on the Master tab. For the recommended configuration, see Calculating the cluster configuration.
-
Host class: Defines technical properties of the virtual machines on which the cluster master hosts will be deployed.
-
Under Storage:
-
Select the disk type.
Warning
You cannot change disk type after you create a cluster.
The selected type determines the increment that you can change your storage size in:
- Non-replicated SSD storage: In increments of 93 GB.
- Local SSD storage:
- For Intel Cascade Lake: In increments of 100 GB.
- For Intel Ice Lake: In increments of 368 GB.
- Network SSD and HDD storage: In increments of 1 GB.
-
-
-
Specify the parameters of segment hosts on the Segment tab. For the recommended configuration, see Calculating the cluster configuration.
- Number of segment hosts.
- Number of segments per host. The maximum value of this parameter depends on the host class.
- Host class: Defines technical properties of the virtual machines on which the cluster segment hosts will be deployed.
- Under Storage:
-
Select the disk type.
The selected type determines the increment that you can change your storage size in:
- Non-replicated SSD storage: In increments of 93 GB.
- Local SSD storage:
- For Intel Cascade Lake: In increments of 100 GB.
- For Intel Ice Lake: In increments of 368 GB.
- Network SSD and HDD storage: In increments of 1 GB.
-
-
If required, configure DBMS cluster-level settings.
-
Click Create.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a Managed Service for Greenplum® cluster:
-
Check whether the folder has any subnets for the cluster hosts:
yc vpc subnet list
If there are no subnets in the folder, create the required subnets in VPC.
-
View the description of the create cluster CLI command:
yc managed-greenplum cluster create --help
-
Specify cluster parameters in the create command (the list of supported parameters in the example is not exhaustive):
yc managed-greenplum cluster create <cluster_name> \ --greenplum-version=<Greenplum_version> \ --environment=<environment> \ --network-name=<network_name> \ --user-name=<username> \ --user-password=<user_password> \ --master-config resource-id=<host_class>,` `disk-size=<storage_size_in_GB>,` `disk-type=<network-hdd|network-ssd|network-ssd-nonreplicated|local-ssd> \ --segment-config resource-id=<host_class>,` `disk-size=<storage_size_in_GB>,` `disk-type=<network-ssd-nonreplicated|local-ssd> \ --zone-id=<availability_zone> \ --subnet-id=<subnet_ID> \ --assign-public-ip=<public_access_to_hosts> \ --security-group-ids=<list_of_security_group_IDs> \ --deletion-protection
Note
The cluster name must be unique within a folder. It may contain Latin letters, numbers, hyphens, and underscores. The name may be up to 63 characters long.
Where:
-
--greenplum-version
: Greenplum® version, 6.19. -
--environment
: Environment:PRODUCTION
: For stable versions of your apps.PRESTABLE
: For testing purposes. The prestable environment is similar to the production environment and likewise covered by the SLA, but it is the first to get new functionalities, improvements, and bug fixes. In the prestable environment, you can test compatibility of new versions with your application.
-
--network-name
: Network name. -
--user-name
: Username. It may contain Latin letters, numbers, hyphens, and underscores, and must start with a letter, number, or underscore. It must be from 1 to 32 characters long. -
--user-password
: Password. It must be from 8 to 128 characters long. -
--master-config
and--segment-config
: Master and segment host configuration:resource-id
: Host class.disk-size
: Storage size in GB.disk-type
: Disk type:network-hdd
(for master hosts only)network-ssd
(for master hosts only)local-ssd
network-ssd-nonreplicated
.
-
--zone-id
: Availability zone. -
--subnet-id
: Subnet ID. You need to specify the ID if the selected availability zone has two or more subnets. -
--assign-public-ip
: Flag used if public access to the hosts is required,true
orfalse
. -
--security-group-ids
: List of security group IDs. -
--deletion-protection
: Cluster deletion protection.Enabled deletion protection will not prevent a manual connection with the purpose to delete database contents.
-
-
To set the start time for the backup, provide the required value in
HH:MM:SS
format under--backup-window-start
:yc managed-greenplum cluster create <cluster_name> \ ... --backup-window-start=<backup_start_time>
-
To create a cluster based on dedicated host groups, specify their IDs as a comma-separated list in the
--host-group-ids
parameter:yc managed-greenplum cluster create <cluster_name> \ ... --host-group-ids=<dedicated_host_group_IDs>
Alert
You cannot edit this setting after you create a cluster. The use of dedicated hosts significantly affects cluster pricing.
-
To set up a maintenance window (including for disabled clusters), provide the relevant value in the
--maintenance-window
parameter when creating your cluster:yc managed-greenplum cluster create <cluster_name> \ ... --maintenance-window type=<maintenance_type>,` `day=<day_of_week>,` `hour=<hour> \
Where
type
is the maintenance type:anytime
(default): Any time.weekly
: On a schedule. If setting this value, specify the day of week and the hour:day
: Day of week inDDD
format:MON
,TUE
,WED
,THU
,FRI
,SAT
, orSUN
.hour
: Hour (UTC) inHH
format:1
to24
.
-
To allow accessing the cluster from different services, provide the
true
value in the relevant parameters when creating a cluster:yc managed-greenplum cluster create <cluster_name> \ ... --datalens-access=<access_from_DataLens> \ --yandexquery-access=<access_from_Yandex_Query>
Available services:
--datalens-access
: Yandex DataLens--yandexquery-access
: Yandex Query
Terraform
For more information about the provider resources, see the documentation on the Terraform
If you change the configuration files, Terraform automatically detects which part of your configuration is already deployed, and what should be added or removed.
To create a Managed Service for Greenplum® cluster:
-
Using the command line, navigate to the folder that will contain the Terraform configuration files with an infrastructure plan. Create the directory if it does not exist.
-
If you don't have Terraform, install it and configure the Yandex Cloud provider.
-
Create a configuration file describing the cloud network and subnets.
The cluster is hosted on a cloud network. If you already have a suitable network, you do not need to describe it again.
Cluster hosts are located on subnets of the selected cloud network. If you already have suitable subnets, you do not need to describe them again.
Example structure of a configuration file that describes a cloud network with a single subnet:
resource "yandex_vpc_network" "<network_name_in_Terraform>" { name = "<network_name>" } resource "yandex_vpc_subnet" "<subnet_name_in_Terraform>" { name = "<subnet_name>" zone = "<availability_zone>" network_id = yandex_vpc_network.<network_name_in_Terraform>.id v4_cidr_blocks = ["<subnet>"] }
-
Create a configuration file with a description of the cluster and its hosts.
Here is an example of the configuration file structure:
resource "yandex_mdb_greenplum_cluster" "<cluster_name_in_Terraform>" { name = "<cluster_name>" environment = "<environment>" network_id = yandex_vpc_network.<network_name_in_Terraform>.id zone = "<availability_zone>" subnet_id = yandex_vpc_subnet.<subnet_name_in_Terraform>.id assign_public_ip = <public_access_to_cluster_hosts> deletion_protection = <cluster_deletion_protection> version = "<Greenplum_version>" master_host_count = <number_of_master_hosts> segment_host_count = <number_of_segment_hosts> segment_in_host = <number_of_segments_per_host> master_subcluster { resources { resource_preset_id = "<host_class>" disk_size = <storage_size_in_GB> disk_type_id = "<disk_type>" } } segment_subcluster { resources { resource_preset_id = "<host_class>" disk_size = <storage_size_in_GB> disk_type_id = "<disk_type>" } } access { data_lens = <access_from_DataLens> yandex_query = <access_from_Yandex_Query> } user_name = "<username>" user_password = "<password>" security_group_ids = ["<list_of_security_group_IDs>"] }
Where:
-
assign_public_ip
: Public access to cluster hosts,true
orfalse
. -
deletion_protection
: Cluster deletion protection,true
orfalse
.Enabled cluster deletion protection will not prevent a manual connection with the purpose to delete database contents.
-
version
: Greenplum® version. -
master_host_count
: Number of master hosts, one or two. -
segment_host_count
: Number of segment hosts, between 2 and 32. -
segment_in_host
: Number of segments per host. The maximum value of this parameter depends on the host class. -
access.data_lens
: Access to the cluster from Yandex DataLens,true
orfalse
. -
access.yandex_query
: Access to the cluster from Yandex Query,true
orfalse
.
For more information about the resources you can create with Terraform, see the provider documentation
. -
-
Check that the Terraform configuration files are correct:
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Create a cluster:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
To create a Managed Service for Greenplum® cluster, use the create REST API method for the Cluster resource or the ClusterService/Create gRPC API call and provide the following in the request:
-
ID of the folder to host the cluster, in the
folderId
parameter. -
Cluster name in the
name
parameter. -
Cluster environment in the
environment
parameter. -
Greenplum® version in the
config.version
parameter. -
Username in the
userName
parameter. -
User password in the
userPassword
parameter. -
Network ID in the
networkId
parameter. -
Security group IDs in the
securityGroupIds
parameter. -
Master host configuration in the
masterConfig
parameter. -
Segment host configuration in the
segmentConfig
parameter.
Provide additional cluster settings, if required:
-
Public access in the
assignPublicIp
parameter. -
Backup window in the
config.backupWindowStart
parameter. -
Cluster access from Yandex DataLens in the
config.access.dataLens
parameter. -
Cluster access from Yandex Query in the
config.access.yandexQuery
parameter. -
Maintenance window (including for disabled clusters) in the
maintenanceWindow
parameter. -
DBMS settings in
configSpec.greenplumConfig_<version>
. -
Routine maintenance operations in the
configSpec.backgroundActivities.analyzeAndVacuum
parameter. -
Cluster deletion protection in the
deletionProtection
parameter.Enabled deletion protection will not prevent a manual connection with the purpose to delete database contents.
Creating a cluster copy
You can create a Greenplum® cluster with the settings of another one you previously created. To do so, you need to import the configuration of the source Greenplum® cluster to Terraform. This way, you can either create an identical copy or use the imported configuration as the baseline and modify it as needed. Importing a configuration is a good idea when the source Greenplum® cluster has a lot of settings and you need to create a similar one.
To create an Greenplum® cluster copy:
-
If you do not have Terraform yet, install it.
-
Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.
-
Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.
-
In the same working directory, place a
.tf
file with the following contents:resource "yandex_mdb_greenplum_cluster" "old" { }
-
Write the ID of the initial Greenplum® cluster to the environment variable:
export GREENPLUM_CLUSTER_ID=<cluster_ID>
You can request the ID with a list of clusters in the folder.
-
Import the settings of the initial Greenplum® cluster into the Terraform configuration:
terraform import yandex_mdb_greenplum_cluster.old ${GREENPLUM_CLUSTER_ID}
-
Get the imported configuration:
terraform show
-
Copy it from the terminal and paste it into the
.tf
file. -
Place the file in the new
imported-cluster
directory. -
Modify the copied configuration so that you can create a new cluster from it:
- Specify the new cluster name in the
resource
string and thename
parameter. - Delete the
created_at
,health
,id
,status
,master_hosts
, andsegment_hosts
parameters. - Add the
user_password
parameter. - If the
maintenance_window
section hastype = "ANYTIME"
, delete thehour
parameter. - Optionally, make further changes if you need to customize the configuration.
- Specify the new cluster name in the
-
Get the authentication credentials in the
imported-cluster
directory. -
In the same directory, configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it
. -
Place the configuration file in the
imported-cluster
directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file. -
Check that the Terraform configuration files are correct:
terraform validate
If there are any errors in the configuration files, Terraform will point them out.
-
Create the required infrastructure:
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console
. -
Examples
Creating an cluster
Create a Managed Service for Greenplum® cluster with the following test specifications:
-
Name:
gp-cluster
-
Version:
6.19
. -
Environment:
PRODUCTION
-
Network:
default
-
User:
user1
. -
Password:
user1user1
. -
Master and segment hosts:
- Class:
s2.medium
- With 100 GB local SSD (
local-ssd
) storage
- Class:
-
Availability zone:
ru-central1-a
; subnet:b0rcctk2rvtr8efcch64
. -
With public access to hosts.
-
Security group:
enp6saqnq4ie244g67sb
-
With protection against accidental cluster deletion.
Run the following command:
yc managed-greenplum cluster create \
--name=gp-cluster \
--greenplum-version=6.19 \
--environment=PRODUCTION \
--network-name=default \
--user-name=user1 \
--user-password=user1user1 \
--master-config resource-id=s2.medium,`
`disk-size=100,`
`disk-type=local-ssd \
--segment-config resource-id=s2.medium,`
`disk-size=100,`
`disk-type=local-ssd \
--zone-id=ru-central1-a \
--subnet-id=b0rcctk2rvtr8efcch64 \
--assign-public-ip=true \
--security-group-ids=enp6saqnq4ie244g67sb \
--deletion-protection
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.