Greenplum® settings
For Yandex MPP Analytics for PostgreSQL clusters, you can configure Greenplum® settings. Some settings are configured at the cluster level, while others, at the level of external data sources, such as S3, JDBC, HDFS, Hive.
The label next to the setting name helps determine which interface is used to set the value of this setting: the management console, CLI, API, SQL, or Terraform. The All interfaces label means that all of the above interfaces are supported.
Depending on the interface you select, the same setting will be represented differently. For example, Max connections in the management console is the same as:
max_connectionsin the gRPC APImaxConnectionsin the REST API
Settings depending on the storage size
The values of some Greenplum® settings may be automatically adjusted when you change the storage size:
- If the values were not specified or are not suitable for the new size, the default settings for this size will apply.
- If the settings you specified manually are suitable for the new size, they will be preserved.
The settings that depend on the storage size are:
Cluster-level DBMS settings
The following settings are available:
-
gp_add_column_inherits_table_setting
Management consoleTerraformAPIThis setting controls whether to apply the data compression parameters (
compresstype,compresslevel, andblocksize) specified for the AOCO table when adding a column.By default, the setting is disabled, i.e., the table’s data compression parameters are ignored.
For more information, see the relevant Greenplum® documentation
. -
gp_enable_global_deadlock_detector
Management consoleTerraformAPIEnables or disables the global deadlock detector. This feature checks for locks when performing
UPDATEandDELETEoperations with heap tables in parallel.The setting is disabled by default. In which case the
UPDATEandDELETEoperations are performed one after the other instead of in parallel.For more information, see the Greenplum® documentation
. -
gp_enable_zstd_memory_accounting
Management consoleTerraformAPIEnables Greenplum Database memory accounting for workfiles compressed with the Zstandard (zstd) algorithm. When this parameter is enabled, Greenplum uses its own memory allocator for zstd instead of the allocator built into the zstd library, which allows memory overcommit or exhaustion to be detected by standard mechanisms without causing the server to crash.
By default, the setting is enabled, i.e., Greenplum uses its own memory allocator.
-
gp_global_deadlock_detector_period
Management consoleTerraformAPISets the global deadlock detector's trigger frequency (in seconds).
The minimum value is
5; the default value is120.For more information, see the Greenplum® documentation
. -
gp_max_slices
Management consoleTerraformAPISpecifies the maximum number of slices (portions of a query plan that are run on segment instances) that can be generated by a query. If the query generates more than the specified number of slices, Greenplum Database returns an error and does not run the query. The default value is
0, no maximum value.Running a query that generates a large number of slices might affect Greenplum Database performance. For example, a query that contains
UNIONorUNION ALLoperators over several complex views can generate a large number of slices. You can runEXPLAINon the query to view slice statistics for the query.For more information, see the Greenplum® documentation
. -
gp_workfile_compression
Management consoleTerraformAPIThis setting determines whether temporary files created on the disk during a hash connection or hash aggregation will be compressed.
By default, it is disabled, i.e., temporary files are not compressed.
For more information, see the relevant Greenplum® documentation
.Warning
Changing this setting will cause the cluster hosts to restart one at a time.
-
gp_workfile_limit_per_query
Management consoleTerraformAPIThe maximum amount of disk space (in bytes) the temporary files of an active query can occupy in every segment.
The maximum value is
1099511627776(1 TB), the minimum value is0(unlimited amount), and the default value is0.For more information, see the relevant Greenplum® documentation
. -
gp_workfile_limit_files_per_query
Management consoleTerraformAPIThe maximum number of temporary files the service creates in a segment to process a single query. If the limit is exceeded, the query will be canceled.
The maximum value is
100000, the minimum value is0(unlimited number of temporary files), and the default value is10000.For more information, see the relevant Greenplum® documentation
. -
gp_workfile_limit_per_segment
Management consoleTerraformAPIThe maximum amount of disk space (in bytes) the temporary files of all active queries can occupy in every segment.
The maximum value is
1099511627776(1 TB), the minimum value is0(unlimited amount). The default value depends on the segment host storage size and is calculated by the formula:0.1 × <segment_host_storage_size> / <number_of_segments_per_host>For more information, see the relevant Greenplum® documentation
. -
log_connections
Management consoleThis setting controls whether to log a string detailing each successful connection to the Greenplum® server.
The setting is disabled by default (no logging).
For more information, see the Greenplum® documentation
. -
log_disconnections
Management consoleThis setting controls whether to log session completion. If the setting is enabled, after each completed client session, a string with the session duration is output to the log.
The setting is disabled by default (no logging).
For more information, see the Greenplum® documentation
. -
log_error_verbosity
Management consoleThis setting controls the amount of detail written to the Greenplum® log for each message. Log detail levels in ascending order of verbosity:
terse.default(default value).verbose.
For more information, see the Greenplum® documentation
. -
log_hostname
Management consoleThis setting controls whether to output the host name of the Greenplum® database master server to the connection log. If the setting is enabled, the IP address and host name are logged. If the setting is disabled, only the IP address is logged.
The setting is disabled by default.
For more information, see the Greenplum® documentation
. -
log_min_duration_statement
Management consoleThis setting specifies the minimum command duration required to log the command (in milliseconds).
If set to
0, the runtime of all statements is logged.The minimum value is
-1(disables runtime logging); the maximum value is2147483647; the default value is-1.For more information, see the Greenplum® documentation
. -
log_min_messages
Management consoleThis setting defines the logging level in Greenplum®. All messages of the selected severity level (or higher) are logged. Possible values (in ascending order of severity):
DEBUG5,DEBUG4,DEBUG3,DEBUG2,DEBUG1,INFO,NOTICE,WARNING,ERROR,LOG,FATAL, andPANIC.The default value is
WARNING. This means all the messages with the following severity levels will be logged:WARNING,ERROR,LOG,FATAL, andPANIC.To disable logging of most messages, select
PANIC.For more information, see the relevant Greenplum® documentation
. -
log_statement
Management consoleTerraformAPIFilter for SQL commands that will be written to the Greenplum® log:
NONE: Filter is disabled, no SQL commands are logged.DDL: Logs SQL commands used to change data structure definitions (such asCREATE,ALTER,DROPetc.).MOD: Logs theDDLcommands and commands allowing you to modify data (INSERT,UPDATE,DELETE,TRUNCATE, andCOPY FROM).ALL: Logs all SQL commands.
The default value is
DDL.The
PREPAREandEXPLAIN ANALYZEexpressions are also logged if they contain the relevant types of commands.For more information, see the Greenplum® documentation
. -
log_statement_stats
Management consoleThis setting controls whether to log query statistics (parsing, scheduling, execution).
The setting is disabled by default (no logging).
For more information, see the Greenplum® documentation
. -
master_shared_buffers
Management consoleThe amount of memory the Greenplum® master host uses for shared memory buffers (in bytes).
The minimum value is
1048576(1 MB). The default value is134217728(128 MB).The maximum value is calculated using the following formula:
min(<master_host_storage_size> / 4, 8 * <size_of_DB_data>)For more information, see the relevant Greenplum® documentation
.Warning
Changing this setting will cause the cluster hosts to restart one at a time.
-
max_connections
Management consoleTerraformAPIThe maximum number of concurrent connections to the master host.
The maximum value is
1000, the minimum value is50, and the default value is350. For segment hosts, this value is automatically multiplied by five.If you increase this value, we recommend increasing Max prepared transactions as well.
If you update this setting, both the master and segment hosts will be checked to have at least 20 MB of available RAM per connection. If this condition is not met, this error occurs.
For more information, see the relevant Greenplum® documentation
. -
max_prepared_transactions
Management consoleTerraformAPIThe maximum number of transactions that can be in the prepared state
at the same time.The maximum value is
10000, the minimum value is350, and the default value is350. The values for master hosts and segment hosts are the same.We recommend choosing a value higher than Max connections.
For more information, see the relevant Greenplum® documentation
. -
max_slot_wal_keep_size
Management consoleTerraformAPIThe maximum write-ahead log (WAL)
file size in bytes allowed for replication.The minimum value is
0(no logging), and the maximum value is214748364800(200 GB). The default value depends on the segment host storage size and is calculated by the formula:0.1 × <segment_host_storage_size> / <number_of_segments_per_host>For more information, see the relevant Greenplum® documentation
. -
max_statement_mem
Management consoleTerraformAPIThe maximum amount of memory (in bytes) allocated for query processing.
The minimum value is
134217728(128 MB), the maximum value is1099511627776(1 TB), and the default value is2097152000(2,000 MB).For more information, see the Greenplum® documentation
. -
segment_shared_buffers
Management consoleThe amount of memory the Greenplum® segment hosts use for shared memory buffers (in bytes).
The minimum value is
1048576(1 MB). The default value is134217728(128 MB).The maximum value is calculated using the following formula:
min(<segment_host_storage_size> / (4 * <number_of_segments_per_host>), 8 * <size_of_DB_data>)For more information, see the relevant Greenplum® documentation
.Warning
Changing this setting will cause the cluster hosts to restart one at a time.
External S3 data source settings
The following settings are available:
-
Access Key
Management consoleCLIAPIS3 storage public access key.
For more information, see this Greenplum® guide
. -
Secret Key
Management consoleCLIAPIS3 storage secret access key.
For more information, see this Greenplum® guide
. -
Fast Upload
Management consoleCLIAPIThis setting controls fast uploading of large files to S3 storage. If disabled, PXF generates files on the disk before sending them to S3 storage. If enabled, PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
Fast upload is enabled by default.
For more information, see this Greenplum® guide
. -
Endpoint
Management consoleCLIAPIS3 storage address. The value for Yandex Object Storage is
storage.yandexcloud.net. This is the default value.For more information, see this Greenplum® guide
.
External JDBC data source settings
The following settings are available:
-
Driver
Management consoleCLIAPIJDBC driver class in Java. The possible values are:
com.simba.athena.jdbc.Drivercom.clickhouse.jdbc.ClickHouseDrivercom.ibm.as400.access.AS400JDBCDrivercom.microsoft.sqlserver.jdbc.SQLServerDrivercom.mysql.cj.jdbc.Driverorg.postgresql.Driveroracle.jdbc.driver.OracleDrivernet.snowflake.client.jdbc.SnowflakeDriverio.trino.jdbc.TrinoDriver
For more information, see this Greenplum® guide
. -
Url
Management consoleCLIAPIDatabase URL. Examples:
jdbc:mysql://mysqlhost:3306/testdb: For a local MySQL® DB.jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1: For a Yandex Managed Service for PostgreSQL cluster. The address contains a special FQDN of the master host in the cluster.jdbc:oracle:thin:@host.example:1521:orcl: For an Oracle DB.
For more information, see this Greenplum® guide
. -
User
Management consoleCLIAPIDB owner username.
For more information, see this Greenplum® guide
. -
Password
Management consoleCLIAPIDB user password.
For more information, see this Greenplum® guide
. -
Statement Batch Size
Management consoleCLIAPINumber of rows in a batch for reading from an external table.
The default value is
100.For more information, see this Greenplum® guide
. -
Statement Fetch Size
Management consoleCLIAPINumber of rows to buffer when reading from an external table.
The default value is
1000.For more information, see this Greenplum® guide
. -
Statement Query Timeout
Management consoleCLIAPITime (in seconds) the JDBC driver waits for a read or write operation to complete.
The default value is
60.For more information, see this Greenplum® guide
. -
Pool Enabled
Management consoleCLIAPIThis setting determines whether the JDBC connection pool is used. It is enabled by default.
For more information, see this Greenplum® guide
. -
Pool Maximum Size
Management consoleCLIAPIMaximum number of database server connections.
The default value is
5.For more information, see this Greenplum® guide
. -
Pool Connection Timeout
Management consoleCLIAPIMaximum time (in milliseconds) to wait for a connection from the pool.
The default value is
30000.For more information, see this Greenplum® guide
. -
Pool Idle Timeout
Management consoleCLIAPIMaximum time (in milliseconds) before an inactive connection is considered idle.
The default value is
30000.For more information, see this Greenplum® guide
. -
Pool Minimum Idle
Management consoleCLIAPIMinimum number of idle connections in the pool.
The default value is
0.For more information, see this Greenplum® guide
.
External HDFS data source settings
The following settings are available:
-
Core
Management consoleAPISettings of the file system and security rules.
For more information, see the Apache Hadoop documentation
.-
Default Fs
URI that defines the HDFS file system.
-
Security Auth To Local
Rules for mapping Kerberos principals to user accounts of the operating system.
-
-
Kerberos
Management consoleAPISettings of the Kerberos network authentication protocol.
For more information, see the relevant Greenplum® documentation
.-
Enable
It defines the use of the Kerberos authentication server. By default, it is not used.
-
Primary
Host of the KDC (Key Distribution Center) main server.
-
Realm
Kerberos realm for a Greenplum® database.
-
Kdc Servers
Hosts of KDC servers.
-
Admin server
Host of the administration server. This is usually the main Kerberos server.
-
Default domain
Domain that is used to expand host names when translating Kerberos 4 service principals to Kerberos 5 service principals (e.g., when converting
rcmd.hostnametohost/hostname.domain). -
Keytab Base64
Base64-encoded keytab file contents.
-
-
User Impersonation
Management consoleAPIIt decides whether you can authenticate in an external file storage or DBMS on behalf of a Greenplum® user.
By default, such authentication is prohibited.
For more information, see the relevant Greenplum® documentation
. -
Username
Management consoleAPIUsername that is used to connect to an external file storage or DBMS if user impersonation is disabled.
For more information, see the relevant Greenplum® documentation
. -
Sasl Connection Retries
Management consoleAPIMaximum number of retry attempts by PXF to request a SASL connection if the
GSS initiate failederror occurs.The default value is
5.For more information, see the relevant Greenplum® documentation
. -
ZK Hosts
Management consoleAPIHosts of ZooKeeper servers. The values are specified in
<address>:<port>format.For more information, see the Apache Hadoop documentation
.
-
Dfs
Management consoleAPIDistributed file system settings.
For more information, see the Apache Hadoop documentation
.-
Ha Automatic Failover Enabled
This setting determines whether automatic fault tolerance for high availability of the file system is enabled. It is enabled by default.
-
Block Access Token Enabled
This setting determines whether access tokens are used. By default, tokens are verified when connecting to datanodes.
-
Use Datanode Hostname
This setting determines whether datanode names are used when connecting to the relevant nodes. These are used by default.
-
Nameservices
List of logical names of HDFS services. You can specify any names separating them by commas.
-
-
Yarn
Management consoleAPISettings for the ResourceManager service, which tracks resources within a cluster and schedules running apps, such as MapReduce jobs.
For more information, see the Apache Hadoop documentation
.-
Resourcemanager Ha Enabled
This setting determines whether high availability for ResourceManager is enabled. It is enabled by default.
-
Resourcemanager Ha Auto Failover Enabled
This setting determines whether automatic failover to a different resource is enabled if the active service fails or becomes unresponsive. Automatic failover is enabled by default only if Resourcemanager Ha Enabled is enabled.
-
Resourcemanager Ha Auto Failover Embedded
This setting determines whether to use the embedded ActiveStandbyElector method for selecting the active service. If the current active service fails or becomes unresponsive, ActiveStandbyElector designates another ResourceManager service as active, assuming the managing role.
It is enabled by default only if the Resourcemanager Ha Enabled and Resourcemanager Ha Auto Failover Enabled settings are enabled.
-
Resourcemanager Cluster Id
Cluster ID. It is used to prevent the ResourceManager service from becoming active for another cluster.
-
External Hive data source settings
The following settings are available:
-
Core
Management consoleAPISettings of the file system and security rules.
For more information, see the Apache Hadoop documentation
.-
Default Fs
URI that defines the HDFS file system.
-
Security Auth To Local
Rules for mapping Kerberos principals to user accounts of the operating system.
-
-
Kerberos
Management consoleAPISettings of the Kerberos network authentication protocol.
For more information, see the relevant Greenplum® documentation
.-
Enable
It defines the use of the Kerberos authentication server. By default, it is not used.
-
Primary
Host of the KDC (Key Distribution Center) main server.
-
Realm
Kerberos realm for a Greenplum® database.
-
Kdc Servers
Hosts of KDC servers.
-
Admin server
Host of the administration server. This is usually the main Kerberos server.
-
Default domain
Domain that is used to expand host names when translating Kerberos 4 service principals to Kerberos 5 service principals (e.g., when converting
rcmd.hostnametohost/hostname.domain). -
Keytab Base64
Base64-encoded keytab file contents.
-
-
User Impersonation
Management consoleAPIIt decides whether you can authenticate in an external file storage or DBMS on behalf of a Greenplum® user.
By default, such authentication is prohibited.
For more information, see the relevant Greenplum® documentation
. -
Username
Management consoleAPIUsername that is used to connect to an external file storage or DBMS if user impersonation is disabled.
For more information, see the relevant Greenplum® documentation
. -
Sasl Connection Retries
Management consoleAPIMaximum number of retry attempts by PXF to request a SASL connection if the
GSS initiate failederror occurs.The default value is
5.For more information, see the relevant Greenplum® documentation
. -
ZK Hosts
Management consoleAPIHosts of ZooKeeper servers. The values are specified in
<address>:<port>format.For more information, see the Apache Hadoop documentation
.
-
Ppd
Management consoleAPIThis setting determines whether predicate pushdown is enabled for external table queries. Enabled by default.
For more information, see this Greenplum® guide
. -
Metastore Uris
Management consoleAPIList of comma-separated URIs. To request metadata, the external DBMS connects to Metastore using one of these URIs.
-
Metastore Kerberos Principal
Management consoleAPIService principal for the Metastore Thrift server.
-
Auth Kerberos Principal
Management consoleAPIKerberos server principal.