Greenplum®/Cloudberry connector

Written by

Updated at July 6, 2026

Parallel data reading
Reading data over the GPFDIST protocol
Connector settings

The Greenplum®/Cloudberry connector developed by Yandex based on the PostgreSQL connector allows Managed Service for Trino to read and write data to a Greenplum®/Cloudberry cluster.

The connector supports parallel data reading from SEVERAL Greenplum® segments at the same time and direct segment reading over the GPFDIST protocol, which greatly improves query performance for large-scale data reads. You can use both data reading methods at the same time to optimize the use of resources in Trino and Greenplum® clusters.

The Greenplum®/Cloudberry connector is available in Trino 476 or higher.

Parallel data reading

During parallel reading from a table, data is parallelized based on the gp_segment_id metadata column value.

The level of parallelism depends on the number of segments in the Greenplum® cluster. The maximum parallelism level is limited by the greenplum.max-read-parallelism connector setting and the relevant max_read_parallelism session property.

Parallel reading is illustrated on the following diagram:

When parallel reading is used, the connector performs only partial row filtering during LIMIT pushdown. This does not affect the validity of the query results.

Reading data over the GPFDIST protocol

The connector allows reading data directly from Greenplum® segments via GPFDIST servers created on Trino workers. GPFDIST server activation is controlled by the greenplum.gpfdist.server.enabled connector setting.

In a Trino cluster, you can create not more than eight catalogs with active GPFDIST servers.

Direct reading from Greenplum® segments follow these steps:

The connector creates an external table giving the address of the Trino worker to read data:

CREATE WRITABLE EXTERNAL TEMPORARY TABLE <external_table_name>
       ...
       LOCATION('gpfdist://<Trino_worker_address>');

The connector runs the following query:

INSERT INTO <external_table_name>
SELECT ... FROM <table_name_in_Greenplum®>;

The Greenplum® segments send data to the Trino worker at the specified address.

Data reading from segments is illustrated on the following diagram:

The use of the GPFDIST protocol for data reads introduces the following limitations for the connector:

No support for reading multidimensional arrays.
No support for reading string type arrays.
No support for AS_JSON array processing mode.
When LIMIT and ORDER BY are pushed down at the same time (Top-N pushdown), the connector sorts data only partially. This does not affect the validity of the query results.

Connector settings

The connector's basic settings and their corresponding session properties match those of the PostgreSQL connector of the same version. In addition, the following settings are available:

Configuration	Description	Default value
`greenplum.gpfdist.server.enabled`	Enables GPFDIST servers on Trino workers	`false`
`greenplum.gpfdist.max-processing-threads`	Maximum size of thread pool for asynchronous GPFDIST query processing	`32`
`greenplum.gpfdist.max-query-threads`	Maximum size of thread pool creating external Greenplum® tables and initiating data writes to an external table	`32`
`greenplum.gpfdist.read.enabled`	Enables reading data directly from Greenplum® segments over the GPFDIST protocol	`false`
`greenplum.gpfdist.read.buffer-size`	Buffer size for GPFDIST data reads in data size format. If the buffer overflows, the connector suspends data reception from Greenplum® segments. Matches the `gpfdist_read_buffer_size` session property	`32MB`
`greenplum.gpfdist.retry-timeout`	Maximum time a Greenplum® segment will wait for a response to a GPFDIST query, in duration format. If the value is other than `null`, this setting overrides the Greenplum® gpfdist_retry_timeout property (300 seconds by default)	`null`
`greenplum.max-read-parallelism`	Maximum parallelism for data reads from Greenplum®. Matches the `max_read_parallelism` session property	`1` (no parallelism)
`greenplum.segment-fetch-required`	Decides the connector's behavior if it fails to get informed about the number of Greenplum® segments: If `true`, the Trino query will fail. If `false`, the level of parallelism will be equal to the `max_read_parallelism` session property value. Matches the `segment_fetch_required` session property	`true`

Useful links

Creating a Trino catalog

Greenplum®/Cloudberry connector

Parallel data readingParallel data reading

Reading data over the GPFDIST protocolReading data over the GPFDIST protocol

Connector settingsConnector settings

Useful linksUseful links

Was the article helpful?

Parallel data reading

Reading data over the GPFDIST protocol

Connector settings

Useful links