Expanding a cluster

Written by

Updated at May 5, 2025

Expanding a cluster
Data redistribution monitoring
Additional actions resgroup
- Changing table redistribution priority
- Running data redistribution manually

You can expand a Managed Service for Greenplum® cluster by adding segment hosts to it. You need to add at least two hosts. You can also increase the number of segments per host.

When a cluster is being expanded, its data is automatically redistributed evenly across all hosts in the cluster (including added hosts). For more information, see Expanding a Greenplum® cluster.

Expanding a cluster

Management console

CLI

REST API

gRPC API

Navigate to the folder dashboard and select Managed Service for Greenplum.
Select a cluster and open the Hosts tab.
Click Expand cluster in the top-right corner.
Specify cluster expansion settings:
- Number of new segment hosts: Number of segment hosts to expand the cluster by.
  
  The minimum value (default) is 2, the maximum value cannot exceed the limit on the number of segment hosts in the cluster (minus existing hosts).
- Add segments to host: Number of segments added per host.
  
  Warning
  
  Segments will be added to all hosts, both new and existing ones.
  
  The minimum value (default) is 0 (number of segments per host remains unchanged); the maximum value depends on the host class.
  
  The segment host class and the number of segments per host affect the maximum amount of memory allocated to each Greenplum® server process. If you select a host class with small RAM and specify a large number of segments, an error may occur.
- Block cluster from load: Closing the cluster for access.
  
  If this setting is enabled, you cannot connect to the cluster and it receives no new user requests. This will accelerate cluster expansion.
  
  Warning
  
  If you close the cluster from load and disable background data redistribution, you will lose access to the cluster until its expansion is complete.
  
  The expansion process can be time-consuming.
- Background data redistribution: Using background data redistribution.
  - If background mode is disabled, the redistribution process will be run once as soon as the cluster expansion preparation is over.
    
    The cluster will remain in the Updating status until all cluster expansion stages are completed.
    
    Warning
    
    If data redistribution terminates upon reaching the timeout with some tables remaining unprocessed, you will need to redistribute those tables manually.
  - If background mode is enabled, the redistribution process will be delayed. The process will run on a schedule as a routine maintenance operation.
    
    The cluster will remain in the Updating status only during preparation for cluster expansion.
- Redistribution timeout: Timeout (in seconds), after which the data redistribution process will be interrupted.
  
  Reaching the timeout does not stop the process immediately: data redistribution will be completed for the tables currently being processed (IN PROGRESS status). You can request the status of the tables if the cluster is not closed from load.
  
  The minimum value is 0. The timeout will be calculated automatically depending on the cluster configuration and data size.
  
  The maximum value depends on whether background data redistribution is enabled:
  - If enabled, the maximum value is 28800 (eight hours).
  - If disabled, the maximum value is not limited.
- Number of redistricting streams: Number of threads that will be started during the data redistribution process.
  
  Using more threads will speed up data redistribution but it will also increase the cluster load.
  
  The minimum value (default) is 0 (the number will be calculated automatically depending on the cluster configuration and data size), and the maximum one is 25.
Click Expand.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

To expand a cluster:

View the description of the CLI cluster expand command:
```
yc managed-greenplum cluster expand --help
```
To expand a cluster, run this command:
```
yc managed-greenplum cluster expand <cluster_name_or_ID> \
   --segment-host-count <number_of_added_segment_hosts> \
   --add-segments-per-host-count <number_of_added_segments_per_host> \
   --close-cluster <temporary_ban_on_connection_to_cluster> \
   --delay-redistribution <background_data_redistribution> \
   --duration-seconds <data_redistribution_timeout> \
   --parallel <number_of_data_redistribution_threads>
```
Where:
- --segment-host-count: Number of segment hosts to expand the cluster by.
  
  The minimum value (default) is 2, the maximum value cannot exceed the limit on the number of segment hosts in the cluster (minus existing hosts).
- --add-segments-per-host-count: Number of segments added per host.
  
  Warning
  
  Segments will be added to all hosts, both new and existing ones.
  
  The minimum value (default) is 0 (number of segments per host remains unchanged); the maximum value depends on the host class.
  
  The segment host class and the number of segments per host affect the maximum amount of memory allocated to each Greenplum® server process. If you select a host class with small RAM and specify a large number of segments, an error may occur.
- --close-cluster: Closing the cluster for access: true or false.
  
  If this setting is enabled (true), you cannot connect to the cluster and it does not receive new user requests. This will accelerate cluster expansion.
  
  Warning
  
  If you close the cluster from load and disable background data redistribution, you will lose access to the cluster until its expansion is complete.
  
  The expansion process can be time-consuming.
- --delay-redistribution: Using background data redistribution: true or false.
  - If background mode is disabled (false), the redistribution process will be run once as soon as the cluster expansion preparation is over.
    
    The cluster will remain in the Updating status until all cluster expansion stages are completed.
    
    Warning
    
    If data redistribution terminates upon reaching the timeout with some tables remaining unprocessed, you will need to redistribute those tables manually.
  - If background mode is enabled (true), the redistribution process will be delayed. The process will run on a schedule as a routine maintenance operation.
    
    The cluster will remain in the Updating status only during preparation for cluster expansion.
- --duration-seconds: Timeout (in seconds), after which the data redistribution process will be interrupted.
  
  Reaching the timeout does not stop the process immediately: data redistribution will be completed for the tables currently being processed (IN PROGRESS status). You can request the status of the tables if the cluster is not closed from load.
  
  The minimum value is 0. The timeout will be calculated automatically depending on the cluster configuration and data size.
  
  The maximum value depends on whether background data redistribution is enabled:
  - If enabled, the maximum value is 28800 (eight hours).
  - If disabled, the maximum value is not limited.
- --parallel: Number of threads that will be started during the data redistribution process.
  
  Using more threads will speed up data redistribution but it will also increase the cluster load.
  
  The minimum value (default) is 0 (the number will be calculated automatically depending on the cluster configuration and data size), and the maximum one is 25.
You can get the cluster ID and name with the list of clusters in the folder.

Get an IAM token for API authentication and put it into the environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Use the Cluster.Expand method and send the following request, e.g., via cURL:
```
curl \
    --request POST \
    --header "Authorization: Bearer $IAM_TOKEN" \
    --header "Content-Type: application/json" \
    --url 'https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/expand' \
    --data '{
              "segmentHostCount": "<number_of_added_segment_hosts>",
              "addSegmentsPerHostCount": "<number_of_added_segments_per_host>",
              "duration": "<data_redistribution_timeout>",
              "parallel": "<number_of_data_redistribution_threads>",
              "closeCluster": "<temporary_ban_on_connection_to_cluster>",
              "delayRedistribution": "<background_data_redistribution>"
            }'
```
You can request the cluster ID with the list of clusters in the folder.

Request body parameters:
- segmentHostCount: Number of segment hosts to expand the cluster by.
  
  The minimum value (default) is 2, the maximum value cannot exceed the limit on the number of segment hosts in the cluster (minus existing hosts).
- addSegmentsPerHostCount: Number of segments added per host.
  
  Warning
  
  Segments will be added to all hosts, both new and existing ones.
  
  The minimum value (default) is 0 (number of segments per host remains unchanged); the maximum value depends on the host class.
  
  The segment host class and the number of segments per host affect the maximum amount of memory allocated to each Greenplum® server process. If you select a host class with small RAM and specify a large number of segments, an error may occur.
- duration: Timeout (in seconds), after which the data redistribution process will be interrupted.
  
  Reaching the timeout does not stop the process immediately: data redistribution will be completed for the tables currently being processed (IN PROGRESS status). You can request the status of the tables if the cluster is not closed from load.
  
  The minimum value is 0. The timeout will be calculated automatically depending on the cluster configuration and data size.
  
  The maximum value depends on whether background data redistribution is enabled:
  - If enabled, the maximum value is 28800 (eight hours).
  - If disabled, the maximum value is not limited.
- parallel: Number of threads that will be started during the data redistribution process.
  
  Using more threads will speed up data redistribution but it will also increase the cluster load.
  
  The minimum value (default) is 0 (the number will be calculated automatically depending on the cluster configuration and data size), and the maximum one is 25.
- closeCluster: Closing the cluster for access, true or false.
  
  If this setting is enabled (true), you cannot connect to the cluster and it does not receive new user requests. This will accelerate cluster expansion.
  
  Warning
  
  If you close the cluster from load and disable background data redistribution, you will lose access to the cluster until its expansion is complete.
  
  The expansion process can be time-consuming.
- delayRedistribution: Using background data redistribution, true or false.
  - If background mode is disabled (false), the redistribution process will be run once as soon as the cluster expansion preparation is over.
    
    The cluster will remain in the Updating status until all cluster expansion stages are completed.
    
    Warning
    
    If data redistribution terminates upon reaching the timeout with some tables remaining unprocessed, you will need to redistribute those tables manually.
  - If background mode is enabled (true), the redistribution process will be delayed. The process will run on a schedule as a routine maintenance operation.
    
    The cluster will remain in the Updating status only during preparation for cluster expansion.
View the server response to make sure the request was successful.

Get an IAM token for API authentication and put it into the environment variable:
```
export IAM_TOKEN="<IAM_token>"
```
Clone the cloudapi repository:
```
cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
```
Below, we assume the repository contents are stored in the ~/cloudapi/ directory.
Use the ClusterService.Expand call and send the following request, e.g., via gRPCurl:
```
grpcurl \
    -format json \
    -import-path ~/cloudapi/ \
    -import-path ~/cloudapi/third_party/googleapis/ \
    -proto ~/cloudapi/yandex/cloud/mdb/greenplum/v1/cluster_service.proto \
    -rpc-header "Authorization: Bearer $IAM_TOKEN" \
    -d '{
          "cluster_id": "<cluster_ID>",
          "segment_host_count": "<number_of_added_segment_hosts>",
          "add_segments_per_host_count": "<number_of_added_segments_per_host>",
          "duration": "<data_redistribution_timeout>",
          "parallel": "<number_of_data_redistribution_threads>",
          "close_cluster": "<temporary_ban_on_connection_to_cluster>",
          "delay_redistribution": "<background_data_redistribution>"
        }' \
    mdb.api.cloud.yandex.net:443 \
    yandex.cloud.mdb.greenplum.v1.ClusterService.Expand
```
You can request the cluster ID with the list of clusters in the folder.

Request body parameters:
- segment_host_count: Number of segment hosts to expand the cluster by.
  
  The minimum value (default) is 2, the maximum value cannot exceed the limit on the number of segment hosts in the cluster (minus existing hosts).
- add_segments_per_host_count: Number of segments added per host.
  
  Warning
  
  Segments will be added to all hosts, both new and existing ones.
  
  The minimum value (default) is 0 (number of segments per host remains unchanged); the maximum value depends on the host class.
  
  The segment host class and the number of segments per host affect the maximum amount of memory allocated to each Greenplum® server process. If you select a host class with small RAM and specify a large number of segments, an error may occur.
- duration: Timeout (in seconds), after which the data redistribution process will be interrupted.
  
  Reaching the timeout does not stop the process immediately: data redistribution will be completed for the tables currently being processed (IN PROGRESS status). You can request the status of the tables if the cluster is not closed from load.
  
  The minimum value is 0. The timeout will be calculated automatically depending on the cluster configuration and data size.
  
  The maximum value depends on whether background data redistribution is enabled:
  - If enabled, the maximum value is 28800 (eight hours).
  - If disabled, the maximum value is not limited.
- parallel: Number of threads that will be started during the data redistribution process.
  
  Using more threads will speed up data redistribution but it will also increase the cluster load.
  
  The minimum value (default) is 0 (the number will be calculated automatically depending on the cluster configuration and data size), and the maximum one is 25.
- close_cluster: Closing the cluster for access, true or false.
  
  If this setting is enabled (true), you cannot connect to the cluster and it does not receive new user requests. This will accelerate cluster expansion.
  
  Warning
  
  If you close the cluster from load and disable background data redistribution, you will lose access to the cluster until its expansion is complete.
  
  The expansion process can be time-consuming.
- delay_redistribution: Using background data redistribution, true or false.
  - If background mode is disabled (false), the redistribution process will be run once as soon as the cluster expansion preparation is over.
    
    The cluster will remain in the Updating status until all cluster expansion stages are completed.
    
    Warning
    
    If data redistribution terminates upon reaching the timeout with some tables remaining unprocessed, you will need to redistribute those tables manually.
  - If background mode is enabled (true), the redistribution process will be delayed. The process will run on a schedule as a routine maintenance operation.
    
    The cluster will remain in the Updating status only during preparation for cluster expansion.
View the server response to make sure the request was successful.

Data redistribution monitoring

There are several ways to monitor the data redistribution process:

If the cluster is closed from load and background data redistribution is disabled, you can only track progress using Yandex Cloud monitoring tools.

Follow the Expansion progress chart on the Monitoring tab of the cluster management page or look up the gp_expand.* metrics using Yandex Monitoring.
If the cluster is not closed from load, you can also run SQL queries to the gpexpand data schema objects.

Note

This schema is available only during cluster expansion, then it gets deleted.

To track the progress of the data redistribution process using SQL queries:
1. Connect to the postgres database as a user with the mdb_admin role.
2. Run a query to get the current status of the process:
```
SELECT dbname, fq_name, rank, status, expansion_started, expansion_finished
FROM gpexpand.status_detail
ORDER BY rank;
```
The query result will contain various information about the progress of the data redistribution process. The status column will indicate the current processing status of specific tables:
- NOT STARTED: Table data redistribution is scheduled, the table is in the redistribution queue.
  
  If the cluster is not closed from load, you can influence the table priority to change its position in the queue if needed.
- IN PROGRESS: Table data redistribution is in progress.
- COMPLETED: Table data redistribution is completed.
- NO LONGER EXISTS: Table no longer exists.

Additional actions resgroup

Changing table redistribution priority

During the cluster expansion preparation phase, each table is assigned a rank that affects its redistribution queue priority. The higher its rank, the higher the table's queue priority and, therefore, the faster its data redistribution will be completed.

If you need to, you can change the rank for one or more tables provided the cluster is not closed from load. To do this, you must modify the table in the gpexpand data schema.

Note

This schema is available only during cluster expansion, then it gets deleted.

To change the rank of redistributed tables:

Connect to the postgres database as a user with the mdb_admin role.
Run a query to get the current status of the data redistribution process:
```
SELECT dbname, fq_name, rank, status, expansion_started, expansion_finished
FROM gpexpand.status_detail
ORDER BY rank;
```
This will include full table names (fq_name) and their ranks (rank).

Change the rank for one or more tables:

UPDATE gpexpand.status_detail
SET rank=<required_rank>
WHERE fq_name IN (<list_of_full_table_names>);

For more information, see the relevant Greenplum® documentation.

Running data redistribution manually

If background data redistribution was not used during cluster expansion and the data redistribution process timeout setting was not long enough, the process may end before data redistribution is over for all tables.

This will leave some of the data distributed unevenly. As a result, the Unknown error: Partially Distributed Data error will occur in cluster logs. To fix the error, run data redistribution manually:

Learn the total number of segments in the cluster:
1. Get detailed information about a cluster.
2. Multiply the number of segments per host by the number of segment hosts.
Connect to the database as a user with the mdb_admin role.

Find tables with unevenly distributed data:

SELECT *
    FROM
        (SELECT oid, relname FROM pg_class) pgc
        JOIN
        (SELECT localoid AS oid, numsegments FROM gp_distribution_policy) gpdp
        USING (oid)
    WHERE numsegments != <total_number_of_segments>;

A list of tables will be displayed stating the following for each table:

oid: Table ID.
relname: Table name.
numsegment: Number of segments storing the data.

For more information, see the Greenplum® documentation on the gp_distribution_policy table.

Run data redistribution using the information you got earlier.

To redistribute data of an individual table, run the following request:
- For an ordinary table:
```
ALTER TABLE ONLY <table_name> EXPAND TABLE;
```
- For a partitioned table:
```
ALTER TABLE <table_name>
SET WITH (REORGANIZE=true) <distribution_policy>;
```
  To learn the Greenplum® distribution policy value for an individual table partition, run the following query:
```
SELECT pg_get_table_distributedby(<partition_OID>) AS distribution_policy;
```

Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.

Expanding a cluster

Expanding a clusterExpanding a cluster

Data redistribution monitoringData redistribution monitoring

Additional actions resgroupAdditional actions resgroup

Changing table redistribution priorityChanging table redistribution priority

Running data redistribution manuallyRunning data redistribution manually

Was the article helpful?

Expanding a cluster

Data redistribution monitoring

Additional actions resgroup

Changing table redistribution priority

Running data redistribution manually