Creating an external table using PXF
Getting started
-
In the Managed Service for Greenplum® cluster subnet, set up a NAT gateway and link a routing table.
-
In the same subnet, create a security group allowing all incoming and outgoing traffic from all addresses.
-
Create an external data source. The steps needed to create a source depend on the source connection type:
Creating an external table using an SQL query
The SQL query syntax to create an external table is as follows:
CREATE [WRITABLE] EXTERNAL TABLE <table_name>
(<column_name> <data_type> [, ...])
LOCATION('pxf://<data_path_or_table_name>?PROFILE=<profile_name>&SERVER=<source_name>')
FORMAT '[TEXT|CSV|CUSTOM]';
Where:
-
<table_name>
: Name of the external table you are about to create in the Greenplum® cluster. -
<column_name>
: Column name. -
<data_type>
: Column data type. It must match the column data type in the external DBMS table. -
<data_path_or_table_name>
: External object name, see examples of external tables. -
PROFILE
: Standard interface to an external DBMS (profile), e.g.,JDBC
. The list of possible values depends on the connection type: -
SERVER
: Name of the external PXF data source.Instead of
SERVER
, you can provide the parameters defining the external data source configuration. These depend on the source connection type. For more information, see the Greenplum® PFX documentation and examples of creating external tables.
The WRITABLE
option allows writing data to an external object. To read data from an external object, create an external table with the READABLE
option.
Examples for creating external tables
-
Create a Yandex Managed Service for ClickHouse® cluster with the
chuser
username. -
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
(Optional) Create an external JDBC data source with the following parameters:
- Name:
chserver
- Driver:
com.clickhouse.jdbc.ClickHouseDriver
- Url:
jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1
- User:
chuser
You can get the cluster ID with the list of clusters in the folder.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Connect to the ClickHouse® DB using
clickhouse-client
. -
Create a test table and populate it with data:
CREATE TABLE test (id int) ENGINE = Memory;
INSERT INTO test VALUES (1);
-
Create an external table named
pxf_ch
to reference thetest
table in the ClickHouse® cluster. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1&USER=chuser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
If public access is enabled for ClickHouse® hosts, use an encrypted connection when creating an external table. To do this, specify SSL parameters in your query:
-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:https://c-<cluster_ID>.rw.mdb.yandexcloud.net:8443/db1&USER=chuser&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_ch;
Result:
test_pxf=> SELECT * FROM pxf_ch; id ---- 1 (1 row)
-
Create a Yandex Managed Service for MySQL® cluster with the following settings:
- Username:
mysqluser
. - In the host settings, select the Public access option.
- Username:
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
(Optional) Create an external JDBC data source with the following parameters:
- Name:
mysqlserver
- Driver:
com.mysql.jdbc.Driver
- Url:
jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1
- User:
mysqluser
You can get the cluster ID with the list of clusters in the folder.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Connect to the MySQL® DB using
mysql
. -
Create a test table and populate it with data:
CREATE TABLE test (a INT, b INT);
INSERT INTO test VALUES (1, '11'), (2, '22');
-
Create an external table named
pxf_mysql
to reference thetest
table in the MySQL® cluster. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=mysqlserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1&USER=mysqluser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_mysql;
Result:
test_pxf=> SELECT * FROM pxf_mysql; a | b --+---- 1 | 11 2 | 22 (2 rows)
-
Create a Yandex Managed Service for PostgreSQL cluster with the following settings:
- Username:
pguser
- In the host settings, select the Public access option.
- Username:
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
(Optional) Create an external JDBC data source with the following parameters:
- Name:
pgserver
- Driver:
org.postgresql.Driver
- Url:
jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1
- User:
pguser
You can get the cluster ID with the list of clusters in the folder.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Connect to the PostgreSQL DB using
psql
. -
Create a test table and populate it with data:
CREATE TABLE public.test ("a" INT,"b" INT);
INSERT INTO public.test VALUES (1, '11'), (2, '22');
-
Create an external table named
pxf_pg
to reference thepublic.test
table in the PostgreSQL cluster. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int) LOCATION ('pxf://public.test?PROFILE=JDBC&SERVER=pgserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int) LOCATION ('pxf://public.test?PROFILE=JDBC&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1&USER=pguser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_pg;
Result:
test_pxf=> SELECT * FROM pxf_pg; a | b --+---- 1 | 11 2 | 22 (2 rows)
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Create an Object Storage bucket with restricted access.
-
(Optional) Create an external S3 data source with the following parameters:
- Name:
objserver
- Access Key: ID of the static access key you previously created.
- Secret Key: Secret key you previously created along with the static access key.
- Endpoint:
storage.yandexcloud.net
.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Create a test file named
test.csv
on the local machine:1,111 2,222
-
Upload your test file to the bucket.
Files you are uploading to the bucket must not start with
.
and_
. Such files are considered hidden, and PXF does not read data from them. -
To read data from the Object Storage bucket:
-
Create an external table named
pxf_s3_read
to reference the bucket. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int) LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&SERVER=objserver') FORMAT 'CSV';
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int) LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net') FORMAT 'CSV';
-
-
Query the external table for data:
SELECT * FROM pxf_s3_read;
Result:
test_pxf=> SELECT * FROM pxf_s3_read; a | b ---+---- 1 | 111 2 | 222 (2 rows)
-
-
To write data to a Object Storage bucket:
-
Create an external table named
pxf_s3_write
with theWRITABLE
option. You do not need to specify a file name when creating the table:-
With a data source created:
CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int) LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&SERVER=objserver') FORMAT 'CSV';
-
With no data source created:
CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int) LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net') FORMAT 'CSV';
-
-
Add data to the table:
INSERT INTO pxf_s3_write VALUES (3,333);
INSERT 0 1
-
Make sure that a new object has been created in the bucket.
-
Note
To create an external table from Object Storage, you can use the S3 protocol and transmit your static key parameters using a file located on a web server. For more information, see this tutorial.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.
ClickHouse® is a registered trademark of ClickHouse, Inc