Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI Studio
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Greenplum®
  • Getting started
    • All guides
    • Connecting to a database
    • Queries in Yandex WebSQL
      • Overview
      • Creating an external table
      • Changing PXF settings
    • Connecting to an external file server (gpfdist)
    • Auxiliary utilities
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Getting started
  • Creating an external table using an SQL query
  • Examples for creating external tables
  1. Step-by-step guides
  2. Working with PXF
  3. Creating an external table

Creating an external table using PXF

Written by
Yandex Cloud
Updated at May 22, 2025
  • Getting started
  • Creating an external table using an SQL query
  • Examples for creating external tables

Getting startedGetting started

  1. In the Managed Service for Greenplum® cluster subnet, set up a NAT gateway and link a routing table.

  2. In the same subnet, create a security group allowing all incoming and outgoing traffic from all addresses.

  3. Create an external data source. The steps needed to create a source depend on the source connection type:

    • S3
    • JDBC
    • HDFS
    • Hive

Creating an external table using an SQL queryCreating an external table using an SQL query

The SQL query syntax to create an external table is as follows:

CREATE [WRITABLE] EXTERNAL TABLE <table_name>
       (<column_name> <data_type> [, ...])
       LOCATION('pxf://<data_path_or_table_name>?PROFILE=<profile_name>&SERVER=<source_name>')
       FORMAT '[TEXT|CSV|CUSTOM]';

Where:

  • <table_name>: Name of the external table you are about to create in the Greenplum® cluster.

  • <column_name>: Column name.

  • <data_type>: Column data type. It must match the column data type in the external DBMS table.

  • <data_path_or_table_name>: External object name, see examples of external tables.

  • PROFILE: Standard interface to an external DBMS (profile), e.g., JDBC. The list of possible values depends on the connection type:

    • S3
    • JDBC
    • HDFS and Hive
  • SERVER: Name of the external PXF data source.

    Instead of SERVER, you can provide the parameters defining the external data source configuration. These depend on the source connection type. For more information, see the Greenplum® PFX documentation and examples of creating external tables.

The WRITABLE option allows writing data to an external object. To read data from an external object, create an external table with the READABLE option.

Examples for creating external tablesExamples for creating external tables

ClickHouse®
MySQL®
PostgreSQL
Object Storage
  1. Create a Yandex Managed Service for ClickHouse® cluster with the chuser username.

  2. In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.

  3. (Optional) Create an external JDBC data source with the following parameters:

    • Name: chserver
    • Driver: com.clickhouse.jdbc.ClickHouseDriver
    • Url: jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1
    • User: chuser

    You can get the cluster ID with the list of clusters in the folder.

    With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.

  4. Connect to the ClickHouse® DB using clickhouse-client.

  5. Create a test table and populate it with data:

    CREATE TABLE test (id int) ENGINE = Memory;
    
    INSERT INTO test VALUES (1);
    
  6. Connect to the Greenplum® DB.

  7. Create an external table named pxf_ch to reference the test table in the ClickHouse® cluster. The SQL query depends on whether you previously created an external data source or not:

    • With a data source created:

      CREATE READABLE EXTERNAL TABLE pxf_ch(id int)
      LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
    • With no data source created:

      CREATE READABLE EXTERNAL TABLE pxf_ch(id int)
      LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1&USER=chuser')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      

    If public access is enabled for ClickHouse® hosts, use an encrypted connection when creating an external table. To do this, specify SSL parameters in your query:

    • With a data source created:

      CREATE READABLE EXTERNAL TABLE pxf_ch(id int)
      LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
    • With no data source created:

      CREATE READABLE EXTERNAL TABLE pxf_ch(id int)
      LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:https://c-<cluster_ID>.rw.mdb.yandexcloud.net:8443/db1&USER=chuser&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
  8. Query the external table for data:

    SELECT * FROM pxf_ch;
    

    Result:

    test_pxf=> SELECT * FROM pxf_ch;
     id
    ----
      1
    (1 row)
    
  1. Create a Yandex Managed Service for MySQL® cluster with the following settings:

    • Username: mysqluser.
    • In the host settings, select the Public access option.
  2. In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.

  3. (Optional) Create an external JDBC data source with the following parameters:

    • Name: mysqlserver
    • Driver: com.mysql.jdbc.Driver
    • Url: jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1
    • User: mysqluser

    You can get the cluster ID with the list of clusters in the folder.

    With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.

  4. Connect to the MySQL® DB using mysql.

  5. Create a test table and populate it with data:

    CREATE TABLE test (a INT, b INT);
    
    INSERT INTO test VALUES (1, '11'), (2, '22');
    
  6. Connect to the Greenplum® DB.

  7. Create an external table named pxf_mysql to reference the test table in the MySQL® cluster. The SQL query depends on whether you previously created an external data source or not:

    • With a data source created:

      CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int)
      LOCATION ('pxf://test?PROFILE=JDBC&SERVER=mysqlserver')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
    • With no data source created:

      CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int)
      LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1&USER=mysqluser')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
  8. Query the external table for data:

    SELECT * FROM pxf_mysql;
    

    Result:

    test_pxf=> SELECT * FROM pxf_mysql;
    a | b
    --+----
    1 | 11
    2 | 22
    (2 rows)
    
  1. Create a Yandex Managed Service for PostgreSQL cluster with the following settings:

    • Username: pguser
    • In the host settings, select the Public access option.
  2. In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.

  3. (Optional) Create an external JDBC data source with the following parameters:

    • Name: pgserver
    • Driver: org.postgresql.Driver
    • Url: jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1
    • User: pguser

    You can get the cluster ID with the list of clusters in the folder.

    With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.

  4. Connect to the PostgreSQL DB using psql.

  5. Create a test table and populate it with data:

    CREATE TABLE public.test ("a" INT,"b" INT);
    
    INSERT INTO public.test VALUES (1, '11'), (2, '22');
    
  6. Connect to the Greenplum® DB.

  7. Create an external table named pxf_pg to reference the public.test table in the PostgreSQL cluster. The SQL query depends on whether you previously created an external data source or not:

    • With a data source created:

      CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int)
      LOCATION ('pxf://public.test?PROFILE=JDBC&SERVER=pgserver')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
    • With no data source created:

      CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int)
      LOCATION ('pxf://public.test?PROFILE=JDBC&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1&USER=pguser')
      FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
      
  8. Query the external table for data:

    SELECT * FROM pxf_pg;
    

    Result:

    test_pxf=> SELECT * FROM pxf_pg;
    a | b
    --+----
    1 | 11
    2 | 22
    (2 rows)
    
  1. In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.

  2. Create an Object Storage bucket with restricted access.

  3. Create a static access key.

  4. (Optional) Create an external S3 data source with the following parameters:

    • Name: objserver.
    • Access Key: ID of the static access key you previously created.
    • Secret Key: Secret key you previously created along with the static access key.
    • Endpoint: storage.yandexcloud.net.

    With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.

  5. Create a test file named test.csv on the local machine:

    1,111
    2,222
    
  6. Upload your test file to the bucket.

    Files you are uploading to the bucket must not start with . and _. Such files are considered hidden, and PXF does not read data from them.

  7. Connect to the Greenplum® DB.

  8. To read data from the Object Storage bucket:

    1. Create an external table named pxf_s3_read to reference the bucket. The SQL query depends on whether you previously created an external data source or not:

      • With a data source created:

        CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int)
        LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&SERVER=objserver')
        FORMAT 'CSV';
        
      • With no data source created:

        CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int)
        LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net')
        FORMAT 'CSV';
        
    2. Query the external table for data:

      SELECT * FROM pxf_s3_read;
      

      Result:

      test_pxf=> SELECT * FROM pxf_s3_read;
      a | b
      ---+----
      1 | 111
      2 | 222
      (2 rows)
      
  9. To write data to a Object Storage bucket:

    1. Create an external table named pxf_s3_write with the WRITABLE option. You do not need to specify a file name when creating the table:

      • With a data source created:

        CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int)
        LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&SERVER=objserver')
        FORMAT 'CSV';
        
      • With no data source created:

        CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int)
        LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net')
        FORMAT 'CSV';
        
    2. Add data to the table:

      INSERT INTO pxf_s3_write VALUES (3,333);
      
      INSERT 0 1
      
    3. Make sure that a new object has been created in the bucket.

Note

To create an external table from Object Storage, you can use the S3 protocol and transmit your static key parameters using a file located on a web server. For more information, see this tutorial.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Hive
Next
Changing PXF settings
© 2025 Direct Cursus Technology L.L.C.