Greenplum® settings
For Managed Service for Greenplum® clusters, you can configure Greenplum® settings. Some settings are configured at the cluster level, while others, at the level of external data sources, such as S3, JDBC, HDFS, and Hive.
The label next to the setting name helps determine which interface is used to set the value of this setting: the management console, CLI, API, SQL, or Terraform. The All interfaces
label means that all of the above interfaces are supported.
Depending on the selected interface, the same setting may be represented in a different way. For example, Max connections in the management console is:
max_connections
in the gRPC APImaxConnections
in the REST API
Settings depending on the storage size
The values of some Greenplum® settings may be automatically adjusted when you change the storage size:
- If the values were not specified or are not suitable for the new size, the default settings for this size will apply.
- If the settings you specified manually are suitable for the new size, they will be preserved.
The settings that depend on the storage size are:
Cluster-level DBMS settings
You can use the following settings:
-
Gp add column inherits table setting
Management console
API
This setting controls whether to apply the data compression parameters (
compresstype
,compresslevel
, andblocksize
) specified for the AOCO table when adding a column.By default, the setting is disabled, i.e., the table’s data compression parameters are ignored.
For more information, see the Greenplum® documentation
. -
Gp workfile compression
Management console
API
This setting determines whether temporary files created on the disk during a hash connection or hash aggregation will be compressed.
By default, it is disabled, i.e., temporary files are not compressed.
For more information, see the Greenplum® documentation
. -
Gp workfile limits per query
Management console
API
The maximum amount of disk space (in bytes) the temporary files of an active query can occupy in every segment.
The maximum value is
1099511627776
(1 TB), the minimum value is0
(unlimited amount), and the default value is0
.For more information, see the Greenplum® documentation
. -
Gp workfile limit files per query
Management console
API
The maximum number of temporary files the service creates in a segment to process a single query. If the limit is exceeded, the query will be canceled.
The maximum value is
100000
, the minimum value is0
(unlimited number of temporary files), and the default value is10000
.For more information, see the Greenplum® documentation
. -
Gp workfile limit per segment
Management console
API
The maximum amount of disk space (in bytes) the temporary files of all active queries can occupy in every segment.
The maximum value is
1099511627776
(1 TB), the minimum value is0
(unlimited amount). The default value depends on the segment host storage size and is calculated by the formula:0.1 × <segment_host_storage_size> / <number_of_segments_per_host>
For more information, see the Greenplum® documentation
. -
Log connections
Management console
This setting controls whether to log a string detailing each successful connection to the Greenplum® server.
The setting is disabled by default (no logging).
For more information, see the Greenplum® documentation
. -
Log disconnections
Management console
This setting controls whether to log session completion. If the setting is enabled, after each completed client session, a string with the session duration is output to the log.
The setting is disabled by default (no logging).
For more information, see the Greenplum® documentation
. -
Log error verbosity
Management console
This setting controls the amount of detail written to the Greenplum® log for each message. Log detail levels in ascending order of verbosity:
terse
.default
(default value).verbose
.
For more information, see the Greenplum® documentation
. -
Log hostname
Management console
This setting controls whether to output the host name of the Greenplum® database master server to the connection log. If the setting is enabled, the IP address and host name are logged. If the setting is disabled, only the IP address is logged.
This setting is disabled by default.
For more information, see the Greenplum® documentation
. -
Log min duration statement
Management console
This setting specifies the minimum command duration required to log the command (in milliseconds).
If the value is
0
, the runtime of all commands is logged.The minimum value is
-1
(disables runtime logging), the maximum value is2147483647
. The default value is-1
.For more information, see the Greenplum® documentation
. -
Log min messages
Management console
This setting defines the logging level in Greenplum®. All messages of the selected severity level (or higher) are logged. Possible values (in ascending order of severity):
DEBUG5
,DEBUG4
,DEBUG3
,DEBUG2
,DEBUG1
,INFO
,NOTICE
,WARNING
,ERROR
,LOG
,FATAL
, andPANIC
.The default value is
WARNING
. This means all the messages with the following severity levels will be logged:WARNING
,ERROR
,LOG
,FATAL
, andPANIC
.To disable logging of most messages, select
PANIC
.For more information, see the Greenplum® documentation
. -
Log statement
Management console
API
Filter for SQL commands that will be written to the Greenplum® log:
NONE
: Filter is disabled, no SQL commands are logged.DDL
: Logs SQL commands used to change data structure definitions (such asCREATE
,ALTER
,DROP
etc.).MOD
: Logs theDDL
commands and commands allowing you to modify data (INSERT
,UPDATE
,DELETE
,TRUNCATE
, andCOPY FROM
).ALL
: Logs all SQL commands.
The default value is
ALL
.The
PREPARE
andEXPLAIN ANALYZE
expressions are also logged if they contain the relevant types of commands.For more information, see the Greenplum® documentation
. -
Log statement stats
Management console
This setting controls whether to log query statistics (parsing, scheduling, execution).
The setting is disabled by default (no logging).
For more information, see the Greenplum® documentation
. -
Max connections
Management console
API
The maximum number of concurrent connections to the master host.
The maximum value is
1000
, the minimum value is250
, and the default value is350
. For segment hosts, this value is automatically multiplied by five.If you increase this value, we recommend increasing Max prepared transactions as well.
For more information, see the Greenplum® documentation
. -
Max prepared transactions
Management console
API
The maximum number of transactions that can be in the prepared state
at the same time.The maximum value is
10000
, the minimum value is350
, and the default value is350
. The values for master hosts and segment hosts are the same.We recommend choosing a value higher than Max connections.
For more information, see the Greenplum® documentation
. -
Max slot wal keep size
Management console
API
The maximum write-ahead log (WAL)
file size in bytes allowed for replication.The minimum value is
0
(no logging), and the maximum value is214748364800
(200 GB). The default value depends on the segment host storage size and is calculated by the formula:0.1 × <segment_host_storage_size> / <number_of_segments_per_host>
For more information, see the Greenplum® documentation
. -
Max statement mem
Management console
API
The maximum amount of memory (in bytes) allocated for query processing.
The minimum value is
134217728
(128 MB), the maximum value is1099511627776
(1 TB), and the default value is2097152000
(2,000 MB).For more information, see the Greenplum® documentation
.
External S3 data source settings
You can use the following settings:
-
Access Key
Management console
CLI
API
S3 storage public access key.
For more information, see the Greenplum® documentation
. -
Secret Key
Management console
CLI
API
S3 storage secret access key.
For more information, see the Greenplum® documentation
. -
Fast Upload
Management console
CLI
API
This setting controls fast uploading of large files to S3 storage. If disabled, PXF generates files on the disk before sending them to S3 storage. If enabled, PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
Fast upload is enabled by default.
For more information, see the Greenplum® documentation
. -
Endpoint
Management console
CLI
API
S3 storage address. Yandex Object Storage is set to
storage.yandexcloud.net
. This is a default value.For more information, see the Greenplum® documentation
.
External JDBC data source settings
You can use the following settings:
-
Driver
Management console
CLI
API
JDBC driver class in Java. The possible values include:
com.simba.athena.jdbc.Driver
com.clickhouse.jdbc.ClickHouseDriver
com.ibm.as400.access.AS400JDBCDriver
com.microsoft.sqlserver.jdbc.SQLServerDriver
com.mysql.cj.jdbc.Driver
org.postgresql.Driver
oracle.jdbc.driver.OracleDriver
net.snowflake.client.jdbc.SnowflakeDriver
io.trino.jdbc.TrinoDriver
For more information, see the Greenplum® documentation
. -
Url
Management console
CLI
API
Database URL. Examples:
jdbc:mysql://mysqlhost:3306/testdb
: For a local MySQL® DB.jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1
: For a Yandex Managed Service for PostgreSQL cluster. The address contains a special FQDN of the master host in the cluster.jdbc:oracle:thin:@host.example:1521:orcl
: For an Oracle DB.
For more information, see the Greenplum® documentation
. -
User
Management console
CLI
API
DB owner username.
For more information, see the Greenplum® documentation
. -
Password
Management console
CLI
API
DB user password.
For more information, see the Greenplum® documentation
. -
Statement Batch Size
Management console
CLI
API
Number of rows in a batch for reading from an external table.
The default value is
100
.For more information, see the Greenplum® documentation
. -
Statement Fetch Size
Management console
CLI
API
Number of rows to buffer when reading from an external table.
The default value is
1000
.For more information, see the Greenplum® documentation
. -
Statement Query Timeout
Management console
CLI
API
Time (in seconds) the JDBC driver waits for a read or write operation to complete.
The default value is
60
.For more information, see the Greenplum® documentation
. -
Pool Enabled
Management console
CLI
API
This setting determines whether the JDBC connection pool is used. It is enabled by default.
For more information, see the Greenplum® documentation
. -
Pool Maximum Size
Management console
CLI
API
Maximum number of database server connections.
The default value is
5
.For more information, see the Greenplum® documentation
. -
Pool Connection Timeout
Management console
CLI
API
Maximum time (in milliseconds) to wait for a connection from the pool.
The default value is
30000
.For more information, see the Greenplum® documentation
. -
Pool Idle Timeout
Management console
CLI
API
Maximum time (in milliseconds) before an inactive connection is considered idle.
The default value is
30000
.For more information, see the Greenplum® documentation
. -
Pool Minimum Idle
Management console
CLI
API
Minimum number of idle connections in the pool.
The default value is
0
.For more information, see the Greenplum® documentation
.
External HDFS data source settings
You can use the following settings:
-
Core
Management console
API
Settings of the file system and security rules.
For more information, see the Apache Hadoop documentation
.-
Default Fs
URI that defines the HDFS file system.
-
Security Auth To Local
Rules for mapping Kerberos principals to user accounts of the operating system.
-
-
Kerberos
Management console
API
Settings of the Kerberos network authentication protocol.
For more information, see the Greenplum® documentation
.-
Enable
It defines the use of the Kerberos authentication server. By default, it is not used.
-
Primary
Host of the KDC (Key Distribution Center) main server.
-
Realm
Kerberos realm for a Greenplum® database.
-
Kdc Servers
Hosts of KDC servers.
-
Admin server
Host of the administration server. This is usually the main Kerberos server.
-
Default domain
Domain that is used to expand host names when translating Kerberos 4 service principals to Kerberos 5 service principals (e.g., when converting
rcmd.hostname
tohost/hostname.domain
). -
Keytab Base64
Base64-encoded keytab file contents.
-
-
User Impersonation
Management console
API
It defines whether you can authenticate in an external file storage or DBMS on behalf of a Greenplum® user.
By default, such authentication is prohibited.
For more information, see the Greenplum® documentation
. -
Username
Management console
API
Username that is used to connect to an external file storage or DBMS if user impersonation is disabled.
For more information, see the Greenplum® documentation
. -
Sasl Connection Retries
Management console
API
Maximum number of retry attempts by PXF to request a SASL connection if the
GSS initiate failed
error occurs.The default value is
5
.For more information, see the Greenplum® documentation
. -
ZK Hosts
Management console
API
Hosts of ZooKeeper servers. The values are specified in
<address>:<port>
format.For more information, see the Apache Hadoop documentation
.
-
Dfs
Management console
API
Distributed file system settings.
For more information, see the Apache Hadoop documentation
.-
Ha Automatic Failover Enabled
This setting determines whether automatic fault tolerance for high availability of the file system is enabled. It is enabled by default.
-
Block Access Token Enabled
This setting determines whether access tokens are used. By default, tokens are verified when connecting to datanodes.
-
Use Datanode Hostname
This setting determines whether datanode names are used when connecting to the relevant nodes. These are used by default.
-
Nameservices
List of logical names of HDFS services. You can specify any names separating them by commas.
-
-
Yarn
Management console
API
Settings for the ResourceManager service, which tracks resources within a cluster and schedules running apps, such as MapReduce jobs.
For more information, see the Apache Hadoop documentation
.-
Resourcemanager Ha Enabled
This setting determines whether high availability for ResourceManager is enabled. It is enabled by default.
-
Resourcemanager Ha Auto Failover Enabled
This setting determines whether automatic failover to a different resource is enabled if the active service fails or becomes unresponsive. Automatic failover is enabled by default only if Resourcemanager Ha Enabled is enabled.
-
Resourcemanager Ha Auto Failover Embedded
This setting determines whether to use the embedded ActiveStandbyElector method for selecting the active service. If the current active service fails or becomes unresponsive, ActiveStandbyElector designates another ResourceManager service as active, assuming the managing role.
It is enabled by default only if the Resourcemanager Ha Enabled and Resourcemanager Ha Auto Failover Enabled settings are enabled.
-
Resourcemanager Cluster Id
Cluster ID. It is used to prevent the ResourceManager service from becoming active for another cluster.
-
External Hive data source settings
You can use the following settings:
-
Core
Management console
API
Settings of the file system and security rules.
For more information, see the Apache Hadoop documentation
.-
Default Fs
URI that defines the HDFS file system.
-
Security Auth To Local
Rules for mapping Kerberos principals to user accounts of the operating system.
-
-
Kerberos
Management console
API
Settings of the Kerberos network authentication protocol.
For more information, see the Greenplum® documentation
.-
Enable
It defines the use of the Kerberos authentication server. By default, it is not used.
-
Primary
Host of the KDC (Key Distribution Center) main server.
-
Realm
Kerberos realm for a Greenplum® database.
-
Kdc Servers
Hosts of KDC servers.
-
Admin server
Host of the administration server. This is usually the main Kerberos server.
-
Default domain
Domain that is used to expand host names when translating Kerberos 4 service principals to Kerberos 5 service principals (e.g., when converting
rcmd.hostname
tohost/hostname.domain
). -
Keytab Base64
Base64-encoded keytab file contents.
-
-
User Impersonation
Management console
API
It defines whether you can authenticate in an external file storage or DBMS on behalf of a Greenplum® user.
By default, such authentication is prohibited.
For more information, see the Greenplum® documentation
. -
Username
Management console
API
Username that is used to connect to an external file storage or DBMS if user impersonation is disabled.
For more information, see the Greenplum® documentation
. -
Sasl Connection Retries
Management console
API
Maximum number of retry attempts by PXF to request a SASL connection if the
GSS initiate failed
error occurs.The default value is
5
.For more information, see the Greenplum® documentation
. -
ZK Hosts
Management console
API
Hosts of ZooKeeper servers. The values are specified in
<address>:<port>
format.For more information, see the Apache Hadoop documentation
.
-
Ppd
Management console
API
This setting determines whether predicate pushdown is enabled for external table queries. It is enabled by default.
For more information, see the Greenplum® documentation
. -
Metastore Uris
Management console
API
List of comma-separated URIs. To request metadata, the external DBMS connects to Metastore using one of these URIs.
-
Metastore Kerberos Principal
Management console
API
Service principal for the Metastore Thrift server.
-
Auth Kerberos Principal
Management console
API
Kerberos server principal.