Creating an external table using PXF
Yandex MPP Analytics for PostgreSQL allows creating external tables to access data in external databases. In clusters with the Apache Cloudberry™
The FDW mechanism supports the following external data types:
s3_pxf_fdwjdbc_pxf_fdwhdfs_pxf_fdwhive_pxf_fdw
External tables are created using the CREATE EXTERNAL TABLE SQL query, whereas foreign tables using CREATE FOREIGN TABLE.
Getting started
-
In the Yandex MPP Analytics for PostgreSQL cluster's subnet, set up a NAT gateway and link a routing table.
-
In the Yandex MPP Analytics for PostgreSQL cluster network, create a security group that allows all incoming and outgoing traffic from all addresses.
-
Create an external data source. The steps you need to follow depend on the source connection type:
Creating a foreign table
Syntax of an SQL query to create a foreign table:
CREATE FOREIGN TABLE <table_name>
(<column_name> <data_type> [, ...])
SERVER "<local_source_name>"
OPTIONS (
resource '<data_path_or_table_name>'
);
Where:
<table_name>: Name of the external table in the Yandex MPP Analytics for PostgreSQL cluster.<column_name>: Column name.<data_type>: Column data type. It must match the type of the corresponding column in the external DBMS.<local_source_name>: Local data source name.<path_to_data_or_table_name>: Path to data or table name on the external source.
Examples of creating foreign tables
-
Create a Yandex Managed Service for ClickHouse® cluster with the
chuserusername. -
Add rules allowing all incoming and outgoing traffic to the security group of the Managed Service for ClickHouse® cluster.
-
Create a table in ClickHouse® and populate it with data:
-
Connect to the ClickHouse® DB using
clickhouse-client. -
Create a table named
test:CREATE TABLE test ( a INT, b INT ) ENGINE = Memory; -
Add data:
INSERT INTO test (a, b) VALUES (1, 11), (2, 22);
-
-
Get access to external data in Yandex MPP Analytics for PostgreSQL:
-
Connect to the database in Yandex MPP Analytics for PostgreSQL.
-
Create an external data source:
CREATE SERVER "chserver" FOREIGN DATA WRAPPER jdbc_pxf_fdw OPTIONS ( jdbc_driver 'com.clickhouse.jdbc.ClickHouseDriver', db_url 'jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/<DB_name>', user 'chuser', pass '<password>' );Where
db_urlis a special FQDN always pointing to an available Managed Service for ClickHouse® cluster host.You can get the cluster ID with the list of clusters in the folder.
If public access is enabled for ClickHouse® hosts, use an encrypted connection when creating an external table. Do it by specifying SSL parameters; use the
httpsprotocol and port8443.CREATE SERVER "chserver" FOREIGN DATA WRAPPER jdbc_pxf_fdw OPTIONS ( ssl 'true', sslmode 'strict', sslrootcert '/etc/greenplum/ssl/allCAs.pem', jdbc_driver 'com.clickhouse.jdbc.ClickHouseDriver', db_url 'jdbc:clickhouse:https://c-<cluster_ID>.rw.mdb.yandexcloud.net:8443/<DB_name>', user 'chuser', pass '<password>' ); -
Create a mapping between a local user and a user in the external data source:
CREATE USER MAPPING FOR CURRENT_USER SERVER "chserver"; -
Create an external table named
fdw_chto reference thetesttable in the ClickHouse® cluster:CREATE FOREIGN TABLE fdw_ch ( a INT, b INT ) SERVER "chserver" OPTIONS ( resource 'test' ); -
Query the external table for data:
SELECT * FROM fdw_ch; -
Populate the external table with data:
INSERT INTO fdw_ch (a, b) VALUES (3, 33);
-
-
Create a Yandex Managed Service for MySQL® cluster with public access to its hosts and
mysqluserfor username. -
Add rules allowing all incoming and outgoing traffic to the security group of the Managed Service for MySQL® cluster.
-
Create a table in MySQL® and populate it with data:
-
Connect to the MySQL® DB using
mysql. -
Create a table named
test:CREATE TABLE test ( a INT, b INT ); -
Add data:
INSERT INTO test (a, b) VALUES (1, 11), (2, 22);
-
-
Get access to external data in Yandex MPP Analytics for PostgreSQL:
-
Connect to the database in Yandex MPP Analytics for PostgreSQL.
-
Create an external data source:
CREATE SERVER "myserver" FOREIGN DATA WRAPPER jdbc_pxf_fdw OPTIONS ( jdbc_driver 'com.mysql.jdbc.Driver', db_url 'jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/<DB_name>', user 'mysqluser', pass '<password>' );db_urlis a special FQDN always pointing to the current master host in the Managed Service for MySQL® cluster.You can get the cluster ID with the list of clusters in the folder.
-
Create a mapping between a local user and a user in the external data source:
CREATE USER MAPPING FOR CURRENT_USER SERVER "myserver"; -
Create an external table named
fdw_mysqlto reference thetesttable in the MySQL® cluster:CREATE FOREIGN TABLE fdw_mysql ( a INT, b INT ) SERVER "myserver" OPTIONS ( resource 'test' ); -
Query the external table for data:
SELECT * FROM fdw_mysql; -
Populate the external table with data:
INSERT INTO fdw_mysql (a, b) VALUES (3, 33);
-
-
Create a Yandex Managed Service for PostgreSQL cluster with public access to its hosts and
pguserfor username. -
Add rules allowing all incoming and outgoing traffic to the security group of the Managed Service for PostgreSQL cluster.
-
Create a table in PostgreSQL and populate it with data:
-
Connect to the PostgreSQL DB using
psql. -
Create a table named
test:CREATE TABLE public.test ( a INT, b INT ); -
Add data:
INSERT INTO public.test (a, b) VALUES (1, 11), (2, 22);
-
-
Get access to external data in Yandex MPP Analytics for PostgreSQL:
-
Connect to the database in Yandex MPP Analytics for PostgreSQL.
-
Create an external data source:
CREATE SERVER "pgserver" FOREIGN DATA WRAPPER jdbc_pxf_fdw OPTIONS ( jdbc_driver 'org.postgresql.Driver', db_url 'jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/<DB_name>', user 'pguser', pass '<password>' );db_urlis a special FQDN always pointing to the current master host in the Managed Service for PostgreSQL cluster.You can get the cluster ID with the list of clusters in the folder.
-
Create a mapping between a local user and a user in the external data source:
CREATE USER MAPPING FOR CURRENT_USER SERVER "pgserver"; -
Create an external table named
fdw_pgto reference thepublic.testtable in the PostgreSQL cluster:CREATE FOREIGN TABLE fdw_pg ( a INT, b INT ) SERVER "pgserver" OPTIONS ( resource 'public.test' ); -
Query the external table for data:
SELECT * FROM fdw_pg; -
Populate the external table with data:
INSERT INTO fdw_pg (a, b) VALUES (3, 33);
-
-
Create a service account named
fdw-agentand assign it thestorage.editorrole to access the Object Storage bucket. -
Create an Object Storage bucket with restricted access.
-
Grant
fdw-agenttheREAD and WRITEpermissions for the new bucket. -
Prepare a test file and upload it to the bucket.
-
Create a test file named
test.csvon your local machine:1,11 2,22 -
Upload the
test.csvfile to the bucket.
-
-
Get access to external data in Yandex MPP Analytics for PostgreSQL:
-
Connect to the database in Yandex MPP Analytics for PostgreSQL.
-
Create an external data source:
CREATE SERVER "objserver" FOREIGN DATA WRAPPER s3_pxf_fdw OPTIONS ( accesskey '<static_access_key_ID>', secretkey '<secret_access_key>', endpoint 'storage.yandexcloud.net' ); -
Create a mapping between a local user and a user in the external data source:
CREATE USER MAPPING FOR CURRENT_USER SERVER "objserver"; -
Create an external table named
fdw_s3to reference thetest.csvtable in the bucket:CREATE FOREIGN TABLE fdw_s3 ( a INT, b INT ) SERVER "objserver" OPTIONS ( resource '<bucket_name>/test.csv', format 'csv' ); -
Query the external table for data:
SELECT * FROM fdw_s3;
-
Tip
You can provide connection parameters either when creating the data source and when creating the external table.
Providing connection parameters when creating the external table
CREATE SERVER "<local_source_name>"
FOREIGN DATA WRAPPER jdbc_pxf_fdw;
CREATE USER MAPPING FOR CURRENT_USER
SERVER "<local_source_name>";
CREATE FOREIGN TABLE <table_name>
(<column_name> <data_type> [, ...])
SERVER "<local_source_name>"
OPTIONS (
resource '<data_path_or_table_name>',
jdbc_driver '<JDBC_driver_class_name>',
db_url 'jdbc:<DBMS_type>://<cluster_FQDN>:<port>/<database_name>',
user '<username>',
pass '<password>'
);
Creating an external table
Syntax of an SQL query to create an external table:
CREATE [WRITABLE] EXTERNAL TABLE <table_name>
(<column_name> <data_type> [, ...])
LOCATION('pxf://<data_path_or_table_name>?PROFILE=<profile_name>&SERVER=<source_name>')
FORMAT '[TEXT|CSV|CUSTOM]';
Where:
-
<table_name>: Name of the external table you are creating in the Yandex MPP Analytics for PostgreSQL cluster. -
<column_name>: Column name. -
<data_type>: Column data type. It must match the column data type in the external DBMS table. -
<data_path_or_table_name>: External object name, see examples of external tables. -
PROFILE: Standard interface to an external DBMS (profile), e.g.,JDBC. The list of possible values depends on the connection type: -
SERVER: Name of the external PXF data source.Instead of
SERVER, you can provide parameters defining the external data source configuration. These depend on the source connection type. For more information, see the Greenplum® PFX guide and examples of creating external tables.
The WRITABLE option allows writing data to an external object. To read data from an external object, create an external table with the READABLE option.
Examples of creating external tables
-
Create a Yandex Managed Service for ClickHouse® cluster with the
chuserusername. -
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Optionally, create an external JDBC data source with the following properties:
-
Name:
chserver -
Driver:
com.clickhouse.jdbc.ClickHouseDriver -
Url:
jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1, where:c-<cluster_id>.rw.mdb.yandexcloud.netis a special FQDN always pointing to an available Managed Service for ClickHouse® cluster host.8123: Port for connection to the Managed Service for ClickHouse® cluster.db1: DB name in the Managed Service for ClickHouse® cluster.
-
User:
chuser
You can get the cluster ID with the list of clusters in the folder.
With no data source created, you will need to provide the source connection properties in the SQL query for creating an external table.
-
-
Connect to the ClickHouse® DB using
clickhouse-client. -
Create a test table and populate it with data:
CREATE TABLE test (id int) ENGINE = Memory;INSERT INTO test VALUES (1); -
Create an external table named
pxf_chto reference thetesttable in the ClickHouse® cluster. The SQL query depends on whether you previously created an external data source:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); -
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1&USER=chuser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
If public access is enabled for ClickHouse® hosts, use an encrypted connection when creating an external table. To do this, specify SSL parameters in your query:
-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); -
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:https://c-<cluster_ID>.rw.mdb.yandexcloud.net:8443/db1&USER=chuser&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_ch;Result:
test_pxf=> SELECT * FROM pxf_ch; id ---- 1 (1 row)
-
Create a Yandex Managed Service for MySQL® cluster with the following settings:
- Username:
mysqluser. - In the host settings, select the Public access option.
- Username:
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Optionally, create an external JDBC data source with the following properties:
-
Name:
mysqlserver -
Driver:
com.mysql.jdbc.Driver -
Url:
jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1, where:c-<cluster_ID>.rw.mdb.yandexcloud.netis a special FQDN always pointing to the current master host in the Managed Service for MySQL® cluster.3306: Port for connection to the Managed Service for MySQL® cluster.db1: DB name in the Managed Service for MySQL® cluster.
-
User:
mysqluser
You can get the cluster ID with the list of clusters in the folder.
With no data source created, you will need to provide the source connection properties in the SQL query for creating an external table.
-
-
Connect to the MySQL® DB using
mysql. -
Create a test table and populate it with data:
CREATE TABLE test (a INT, b INT);INSERT INTO test VALUES (1, 11), (2, 22); -
Create an external table named
pxf_mysqlto reference thetesttable in the MySQL® cluster. The SQL query depends on whether you previously created an external data source:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=mysqlserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); -
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1&USER=mysqluser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_mysql;Result:
test_pxf=> SELECT * FROM pxf_mysql; a | b --+---- 1 | 11 2 | 22 (2 rows)
-
Create a Yandex Managed Service for PostgreSQL cluster with the following settings:
- Username:
pguser - In the host settings, select the Public access option.
- Username:
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Optionally, create an external JDBC data source with the following properties:
-
Name:
pgserver -
Driver:
org.postgresql.Driver -
Url:
jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1, where:c-<cluster_ID>.rw.mdb.yandexcloud.netis a special FQDN always pointing to the current master host in the Managed Service for PostgreSQL cluster.6432: Port for connection to the Managed Service for PostgreSQL cluster.db1: DB name in the Managed Service for PostgreSQL cluster.
-
User:
pguser
You can get the cluster ID with the list of clusters in the folder.
With no data source created, you will need to provide the source connection properties in the SQL query for creating an external table.
-
-
Connect to the PostgreSQL DB using
psql. -
Create a test table and populate it with data:
CREATE TABLE public.test ("a" INT,"b" INT);INSERT INTO public.test VALUES (1, 11), (2, 22); -
Create an external table named
pxf_pgto reference thepublic.testtable in the PostgreSQL cluster. The SQL query depends on whether you previously created an external data source:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int) LOCATION ('pxf://public.test?PROFILE=JDBC&SERVER=pgserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); -
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int) LOCATION ('pxf://public.test?PROFILE=JDBC&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1&USER=pguser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_pg;Result:
test_pxf=> SELECT * FROM pxf_pg; a | b --+---- 1 | 11 2 | 22 (2 rows)
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Create an Object Storage bucket with restricted access.
-
Optionally, create an external S3 data source with the following properties:
- Name:
objserver. - Access Key: ID of the static access key created earlier.
- Secret Key: Secret key created earlier together with the static access key.
- Endpoint:
storage.yandexcloud.net.
With no data source created, you will need to provide the source connection properties in the SQL query for creating an external table.
- Name:
-
Create a test file named
test.csvon your local machine:1,111 2,222 -
Upload your test file to the bucket.
Files you are uploading to the bucket must not start with
.or_. Such files are considered hidden, and PXF does not read data from them. -
To read data from an Object Storage bucket:
-
Create an external table named
pxf_s3_readto reference the bucket. The SQL query depends on whether you previously created an external data source:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int) LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&SERVER=objserver') FORMAT 'CSV'; -
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int) LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net') FORMAT 'CSV';
-
-
Query the external table for data:
SELECT * FROM pxf_s3_read;Result:
test_pxf=> SELECT * FROM pxf_s3_read; a | b ---+---- 1 | 111 2 | 222 (2 rows)
-
-
To write data to an Object Storage bucket:
-
Create an external table named
pxf_s3_writewith theWRITABLEoption. You do not need to specify a file name when creating the table:-
With a data source created:
CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int) LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&SERVER=objserver') FORMAT 'CSV'; -
With no data source created:
CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int) LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net') FORMAT 'CSV';
-
-
Populate the table with data:
INSERT INTO pxf_s3_write VALUES (3,333);INSERT 0 1 -
Make sure that a new object has been created in the bucket.
-
Note
To create an external table from Object Storage, you can use the S3 protocol and transmit your static key properties using a file located on a web server. For more information, see this tutorial.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.
ClickHouse® is a registered trademark of ClickHouse, Inc