Getting started with Managed Service for Greenplum®
To get started with the service:
To connect to the cluster and work with databases, you will need the following tools:
- DBeaver
graphical IDE. psql
console client.
For more tools you can use to work with Greenplum®, see Connecting to a database.
Getting started
-
Go to the management console
and log in to Yandex Cloud or sign up if not signed up yet. -
If you do not have a folder yet, create one:
-
In the management console
, select the appropriate cloud from the list on the left. -
At the top right, click Create folder.
-
Enter the folder name. The naming requirements are as follows:
- It must be 2 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- It must start with a letter and cannot end with a hyphen.
-
(Optional) Enter a description of the folder.
-
Select Create a default network. This will create a network with subnets in each availability zone. Within this network, a default security group will be created, inside which all network traffic is allowed.
-
Click Create.
-
-
Assign the vpc.user role and the managed-greenplum.editor role or higher to your Yandex Cloud account. These roles allow you to create a cluster.
Note
If unable to manage roles, contact your cloud or organization administrator.
Create a cluster
Create a Greenplum® cluster with public access. You can connect to such a cluster with a Yandex Compute Cloud VM or over the internet.
To create a cluster:
-
In the management console, select the folder where you want to create a DB cluster.
-
Select Managed Service for Greenplum.
-
Click Create cluster.
-
Specify the following cluster parameters:
-
Basic parameters → Cluster name: Cluster name. It must be unique within the folder.
-
Network settings:
-
Network: Specify the network you want the cluster hosts to reside in.
-
Security groups: Specify security groups for the cluster network traffic.
Set up the security groups to be able to connect to the cluster over the internet.
-
Availability zone: Specify an availability zone for the cluster hosts.
-
Subnet: Specify a subnet for the cluster hosts.
-
Enable Public access.
-
-
User:
-
Username: Admin user name. It may contain Latin letters, numbers, hyphens, and underscores, but cannot start with a hyphen. It must be from 1 to 32 characters long.
Note
Such names as
admin
,gpadmin
, mdb_admin,mdb_replication
,monitor
,none
,postgres
,public
, andrepl
are reserved for Managed Service for Greenplum®. You cannot create users with these names. -
Password: Admin user password. It must be from 8 to 128 characters long.
Admin user is a special user required for managing the cluster. This user cannot be deleted. For more information, see Users and roles in Managed Service for Greenplum®.
-
-
Master and Segment: Master and segment host configuration in the Greenplum® cluster.
You can edit the settings on these tabs if the default configuration does not suit your needs. For more information, see Calculating the cluster configuration.
-
-
Click Create cluster.
-
Wait until the cluster is ready: its status on the Managed Service for Greenplum® dashboard will change to Running and its state, to Alive. This may take some time.
For more information about creating a cluster, see Creating a Greenplum® cluster.
Get the cluster ID
Get the cluster ID to use to connect to the cluster:
- In the management console, go to the folder page and select Managed Service for Greenplum.
- Click the name of the cluster you need and select the
Overview tab. - Copy the cluster ID specified under General information.
Connect to the cluster
Connect to the cluster you created:
-
Install DBeaver
to the host to use for the connection.Note
Access to the cluster was tested in DBeaver Community 24.2.3
-
Launch DBeaver.
-
In the Database menu, select New connection.
-
Select Greenplum® from the DB list.
-
Click Next.
-
Specify the main connection parameters on the Main tab:
-
Under Server:
-
Connect by:
Host
. -
Host: Special FQDN of the primary master,
c-<cluster_ID>.rw.mdb.yandexcloud.net
.Cluster ID you got earlier.
-
Database:
postgres
. -
Port:
6432
. -
Show all databases: Enabled.
-
-
Under Authentication:
-
Authentication:
Database Native
. -
User: Admin user name specified when creating the cluster.
-
Password: Admin user password.
You can disable the Save password option, which is enabled by default.
-
-
-
Click Test connection.
Click Download if DBeaver prompts you to download the driver files. You may not have these files if this is your first time using DBeaver to connect to the Greenplum®.
Enter the user password if DBeaver prompts you for one and click OK.
DBeaver will connect to the
postgres
database in the Greenplum® cluster.If the connection is successful, you will get general information about the Greenplum® server and the driver used for the connection. Click OK.
Warning
When connecting, you will use an encrypted connection (with SSL) but without authenticating the cluster hosts. This is the default behavior for all clients
that use thelibpq
library.In the production environment, we recommend connecting to the cluster via SSL with host authentication.
-
(Optional) On the Main tab, click Connection description (name, type, ...) and add a name and description for the connection.
By default, the connection name matches the specified database name, i.e.,
postgres
. You may want to select a different name to avoid confusion: you can use this connection to work with any databases, not justpostgres
. -
Click Finish.
You will see the new connection in the left-hand panel on the Databases tab.
-
Expand the created connection in the left-hand panel on the Databases tab.
Enter the user password if DBeaver prompts you for one and click OK.
DBeaver will connect to the
postgres
database in the Greenplum® cluster. -
Run a test query:
-
Open the Databases folder, expand the
postgres
database context menu, and select SQL editor → Open SQL console.This will open a console you can use to run SQL queries to the
postgres
database. -
Enter your query in the console:
SELECT version();
-
Run the query by clicking Run SQL script on the far left side of the console.
Alternatively, select Run SQL script in the SQL editor menu.
-
-
Install
thepsql
client to the host you will use to connect to the cluster.For example, to install
psql
on a Linux VM in Yandex Cloud with Ubuntu 24.04: -
Connect to the
postgres
database in the Greenplum® cluster. Use the special primary master FQDN:psql "host=c-<cluster_ID>.rw.mdb.yandexcloud.net \ port=6432 \ dbname=postgres \ user=<username>"
Cluster ID you got earlier.
After running the command, enter the user password to complete the connection process.
Warning
When connecting, you will use an encrypted connection (with SSL) but without authenticating the cluster hosts. This is the default behavior for all clients
that use thelibpq
library.In the production environment, we recommend connecting to the cluster via SSL with host authentication.
-
Run a test query:
SELECT version();
If the connection to the cluster and the test query are successful, the Greenplum® version will be shown.
Create a database
The previously created Greenplum® cluster has only one database, a service database named postgres
. This database is not meant to store user data. For example, you cannot create data schemas in it.
After you connect to the Greenplum® cluster, create a database named sample_db
for storing user data:
-
Expand the previously created connection in the left-hand panel on the Databases tab.
Enter the user password if DBeaver prompts you for one and click OK.
DBeaver will connect to the
postgres
database in the Greenplum® cluster. -
Create a database:
-
Open the Databases folder, expand the
postgres
database context menu, and select SQL editor → Open SQL console.This will open a console you can use to run SQL queries to the
postgres
database. -
Enter your query in the console:
CREATE DATABASE sample_db;
-
Run the query by clicking Run SQL script on the far left side of the console.
Alternatively, select Run SQL script in the SQL editor menu.
-
-
Update the list of databases for it to display the new database.
To do this, open the context menu of the Databases folder and select Refresh.
Tip
If you do not see the
sample_db
database in the list, make sure you enabled the Show all databases option for the connection.
-
Run this request:
CREATE DATABASE sample_db;
-
Output a list of Greenplum® cluster databases and make sure it includes
sample_db
:\list
Run a few queries to the database
To make sure the database was created correctly, run a few queries to it:
-
Expand the previously created connection in the left-hand panel on the Databases tab.
Enter the user password if DBeaver prompts you for one and click OK.
DBeaver will connect to the
postgres
database in the Greenplum® cluster. -
Open the Databases folder, expand the
sample_db
database context menu, and select SQL editor → Open SQL console.This will open a console you can use to run SQL queries to the
sample_db
database. -
Create a table and populate it with data:
-
Enter your queries in the console:
CREATE TABLE sample_table (a int) DISTRIBUTED BY (a); INSERT INTO sample_table SELECT * FROM GENERATE_SERIES(1, 10000);
-
Run the query by clicking Run SQL script on the far left side of the console.
Alternatively, select Run SQL script in the SQL editor menu.
The table will be created and populated with numbers from 1 to 10,000.
The
CREATE TABLE
andINSERT INTO
queries do not return any results. DBeaver will display statistics for the completed queries as follows:Queries 2 Updated Rows 10000 Execute time ... Fetch time ... Total time ... Start time ... Finish time ...
-
-
Get the sum of all numbers in the table. To do this, clear the console, then enter and run the following query:
SELECT sum(a) FROM sample_table;
Expected result:
sum | --------+ 50005000|
-
Get information on how the 10,000 table rows are distributed across Greenplum® segments. To do this, clear the console, then enter and run the following query:
SELECT gp_segment_id, count(*) FROM sample_table GROUP BY gp_segment_id;
The result will depend on the number of segments in the cluster. Distribution of strings by segments may vary as well.
Here is a possible result for a cluster with two segments:
gp_segment_id|count| -------------+-----+ 0| 5013| 1| 4987|
-
End the connection session. To do this, open the connection context menu and select Disconnect.
-
Make sure you are connected to the
sample_db
database:SELECT current_database();
If the query returns a different database name, e.g.,
postgres
, connect to the database you need:\connect sample_db;
-
Create a table and populate it with data:
CREATE TABLE sample_table (a int) DISTRIBUTED BY (a); INSERT INTO sample_table SELECT * FROM GENERATE_SERIES(1, 10000);
The table will be created and populated with numbers from 1 to 10,000.
The
CREATE TABLE
andINSERT INTO
queries do not return any results. -
Get the sum of all numbers in the table:
SELECT sum(a) FROM sample_table;
Expected result:
sum | --------+ 50005000|
-
Get information on how the 10,000 table rows are distributed across Greenplum® segments:
SELECT gp_segment_id, count(*) FROM sample_table GROUP BY gp_segment_id;
The result will depend on the number of segments in the cluster. Distribution of strings by segments may vary as well.
Here is a possible result for a cluster with two segments:
gp_segment_id|count| -------------+-----+ 0| 5013| 1| 4987|
-
End the
psql
session:\quit
What's next
- Read about service concepts.
- Learn more about creating a cluster and connecting to a cluster.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.