Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Managed Service for Trino
  • Getting started
    • Resource relationships
    • Networking in Managed Service for Trino
    • Impersonation
    • Fault-tolerant query execution
    • Host classes
    • Access management
    • Resource groups
    • Maintenance
    • Connector Greenplum®
  • Terraform reference
  • Quotas and limits
  • Access management
  • Pricing policy
  • Yandex Monitoring metrics
  • Audit Trails events
  • Release notes

In this article:

  • Parallel data reading
  • Reading data over the GPFDIST protocol
  • Connector settings
  1. Concepts
  2. Connector Greenplum®

Greenplum®/Cloudberry connector

Written by
Yandex Cloud
Updated at June 23, 2026
  • Parallel data reading
  • Reading data over the GPFDIST protocol
  • Connector settings

The Greenplum®/Cloudberry connector developed by Yandex based on the PostgreSQL connector allows Managed Service for Trino to read and write data to a Greenplum®/Cloudberry cluster.

The connector supports parallel data reading from SEVERAL Greenplum® segments at the same time and direct segment reading over the GPFDIST protocol, which greatly improves query performance for large-scale data reads. You can use both data reading methods at the same time to optimize the use of resources in Trino and Greenplum® clusters.

The Greenplum®/Cloudberry connector is available in Trino 476 or higher.

Parallel data readingParallel data reading

During parallel reading from a table, data is parallelized based on the gp_segment_id metadata column value.

The level of parallelism depends on the number of segments in the Greenplum® cluster. The maximum parallelism level is limited by the greenplum.max-read-parallelism connector setting and the relevant max_read_parallelism session property.

Parallel reading is illustrated on the following diagram:

When parallel reading is used, the connector performs only partial row filtering during LIMIT pushdown. This does not affect the validity of the query results.

Reading data over the GPFDIST protocolReading data over the GPFDIST protocol

The connector allows reading data directly from Greenplum® segments via GPFDIST servers created on Trino workers. GPFDIST server activation is controlled by the greenplum.gpfdist.server.enabled connector setting.

In a Trino cluster, you can create not more than eight catalogs with active GPFDIST servers.

Direct reading from Greenplum® segments follow these steps:

  1. The connector creates an external table giving the address of the Trino worker to read data:

    CREATE WRITABLE EXTERNAL TEMPORARY TABLE <external_table_name>
           ...
           LOCATION('gpfdist://<Trino_worker_address>');
    
  2. The connector runs the following query:

    INSERT INTO <external_table_name>
    SELECT ... FROM <table_name_in_Greenplum®>;
    
  3. The Greenplum® segments send data to the Trino worker at the specified address.

Data reading from segments is illustrated on the following diagram:

The use of the GPFDIST protocol for data reads introduces the following limitations for the connector:

  • No support for reading multidimensional arrays.
  • No support for reading string type arrays.
  • No support for AS_JSON array processing mode.
  • When LIMIT and ORDER BY are pushed down at the same time (Top-N pushdown), the connector sorts data only partially. This does not affect the validity of the query results.

Connector settingsConnector settings

The connector's basic settings and their corresponding session properties match those of the PostgreSQL connector of the same version. In addition, the following settings are available:

Configuration Description Default
value
greenplum.gpfdist.server.enabled Enables GPFDIST servers on Trino workers false
greenplum.gpfdist.max-processing-threads Maximum size of thread pool for asynchronous GPFDIST query processing 32
greenplum.gpfdist.max-query-threads Maximum size of thread pool creating external Greenplum® tables and initiating data writes to an external table 32
greenplum.gpfdist.read.enabled Enables reading data directly from Greenplum® segments over the GPFDIST protocol false
greenplum.gpfdist.read.buffer-size

Buffer size for GPFDIST data reads in data size format. If the buffer overflows, the connector suspends data reception from Greenplum® segments.

Matches the gpfdist_read_buffer_size session property

32MB
greenplum.gpfdist.retry-timeout

Maximum time a Greenplum® segment will wait for a response to a GPFDIST query, in duration format.

If the value is other than null, this setting overrides the Greenplum® gpfdist_retry_timeout property (300 seconds by default)

null
greenplum.max-read-parallelism

Maximum parallelism for data reads from Greenplum®.

Matches the max_read_parallelism session property

1 (no parallelism)
greenplum.segment-fetch-required

Decides the connector's behavior if it fails to get informed about the number of Greenplum® segments:

  • If true, the Trino query will fail.
  • If false, the level of parallelism will be equal to the max_read_parallelism session property value.

Matches the segment_fetch_required session property

true

See alsoSee also

  • Creating a Trino catalog

Was the article helpful?

Previous
Maintenance
Next
Overview
© 2026 Direct Cursus Technology L.L.C.