Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Greenplum®
  • Getting started
    • All guides
    • Connecting to a database
    • Connecting to an external file server (gpfdist)
    • Auxiliary utilities
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Running gpfdist
  • Creating an external table using gpfdist
  • Examples for creating external tables
  1. Step-by-step guides
  2. Connecting to an external file server (gpfdist)

Connecting to an external file server

Written by
Yandex Cloud
Updated at March 20, 2024
  • Running gpfdist
  • Creating an external table using gpfdist
  • Examples for creating external tables

The Greenplum® Parallel File Server (gpfdist) is a utility to read data from and write data to files located on remote servers. It is installed on each segment host of a Managed Service for Greenplum® cluster and provides parallel data loading by distributing it across segments either evenly or according to the distribution key set. This improves performance when handling large amounts of external data.

gpfdist works with any delimited text files and compressed gzip and bzip2 files.

To read or write files on an external server:

  1. Install and run gpfdist as part of the Greenplum® Loader or the Greenplum® Database package on the remote server hosting your target files.
  2. Create an external table in the Greenplum® database to reference these files.

Running gpfdistRunning gpfdist

Note

Downloading and using software from the VMware website is not part of the Yandex Managed Service for Greenplum® Terms of Use and is governed by a separate arrangement between the client and VMware. Yandex is not responsible for the relationship between VMware and the client arising in connection with the client's use of VMware products or services.

  1. Download and install the Greenplum® Loader package from the VMware website or the Greenplum® Database package from the Yandex Object Storage bucket by following this guide.

  2. Run gpfdist:

    gpfdist -d <directory_with_data_files> -p <connection_port> -l <path_to_log_file>
    

    Where:

    • <data_file_directory>: Local path to the directory with files to read or write data from/to using an external table.
    • <connection_port>: Port the utility will run on. The default value is 8080.
    • <log_file_path>: (Optional) Path to the file that gpfdist will write its operation logs to.

    To distribute network load, you can run multiple gpfdist instances on the same server, specifying different directories and connection ports, e.g.:

    gpfdist -d /var/load_files1 -p 8081 -l /home/gpadmin/log1 & \
    gpfdist -d /var/load_files2 -p 8082 -l /home/gpadmin/log2 &
    
  3. Make sure that files from the specified directory are available on the specified port from Yandex Cloud. To do this, run the following command from a VM in Yandex Cloud:

    wget http://hostname:port/filename
    

Creating an external table using gpfdistCreating an external table using gpfdist

SQL query syntax to create an external table:

CREATE [WRITABLE] EXTERNAL TABLE <table_name>
       (<column_name> <data_type> [, ...])
       LOCATION('gpfdist://<path_to_file_on_remote_server>' [, ...])
       FORMAT '[TEXT|CSV|CUSTOM]';

Where:

  • <table_name>: Name of the external table to be created in the Greenplum® database.
  • <column_name>: Table column name.
  • <data_type>: Table column data type.
  • <path_to_file_on_remote_server>: Address of the server where gpfdist is running, the connection port, and the file path. You can set a specific file or a mask using the asterisk symbol (*).

The WRITABLE option enables you to write data to an external object. To be able to read data from an external object, create a table with the READABLE option.

Examples for creating external tablesExamples for creating external tables

  • Creating an external table with data from file.csv on the hostname server:

    CREATE EXTERNAL TABLE tableName (id int)
           LOCATION('gpfdist://hostname:8080/file.csv')
           FORMAT 'CSV' (DELIMITER ',');
    
  • Creating an external table with data from all txt files, where | is a separator and the space indicates NULL values, on the hostname1 and hostname2 servers:

    CREATE EXTERNAL TABLE tableName (...)
           LOCATION('gpfdist://hostname1:8081/*.txt',
                    'gpfdist://hostname2:8081/*.txt')
           FORMAT 'TEXT' (DELIMITER '|' NULL ' ');
    

Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.

Was the article helpful?

Previous
Changing PXF settings
Next
Managing extensions
© 2025 Direct Cursus Technology L.L.C.