Creating an external table using PXF
Getting started
-
In the Managed Service for Greenplum® cluster subnet, set up a NAT gateway and link a routing table.
-
In the same subnet, create a security group allowing all incoming and outgoing traffic from all addresses.
-
Create an external data source. The steps needed to create a source depend on the source connection type:
Creating an external table using an SQL query
The SQL query syntax to create an external table is as follows:
CREATE [WRITABLE] EXTERNAL TABLE <table_name>
(<column_name> <data_type> [, ...])
LOCATION('pxf://<data_path_or_table_name>?PROFILE=<profile_name>&SERVER=<source_name>')
FORMAT '[TEXT|CSV|CUSTOM]';
Where:
-
<table_name>
: Name of the external table to be created in the Greenplum® cluster. -
<column_name>
: Column name. -
<data_type>
: Column data type. It must match the column data type in the external DBMS table. -
<data path or table name>
: External object name, see examples of external tables. -
PROFILE
: Standard interface to an external DBMS (profile), e.g.,JDBC
. The list of possible values depends on the connection type: -
SERVER
: Name of the external PXF data source.Instead of
SERVER
, you can provide the parameters defining the external data source configuration. They depend on the source connection type. For more information, see the Greenplum® PXF documentation and examples for creating external tables.
The WRITABLE
option enables you to write data to an external object. To be able to read data from an external object, create a table with the READABLE
option.
Examples for creating external tables
-
Create a Yandex Managed Service for ClickHouse® cluster with the
chuser
username. -
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
(Optional) Create an external JDBC data source with the following parameters:
- Name:
chserver
- Driver:
com.clickhouse.jdbc.ClickHouseDriver
- Url:
jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1
- User:
chuser
You can get the cluster ID with a list of clusters in the folder.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Connect to the ClickHouse® database using
clickhouse-client
. -
Create a test table and populate it with data:
CREATE TABLE test (id int) ENGINE = Memory;
INSERT INTO test VALUES (1);
-
Create an external table named
pxf_ch
to reference a table namedtest
in the ClickHouse® cluster. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:http://c-<cluster_ID>.rw.mdb.yandexcloud.net:8123/db1&USER=chuser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
If public access is enabled for ClickHouse® hosts, use an encrypted connection when creating an external table. To do this, specify SSL parameters in your query:
-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=chserver&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_ch(id int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.clickhouse.jdbc.ClickHouseDriver&DB_URL=jdbc:clickhouse:https://c-<cluster_ID>.rw.mdb.yandexcloud.net:8443/db1&USER=chuser&ssl=true&sslmode=strict&sslrootcert=/etc/greenplum/ssl/allCAs.pem') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_ch;
Result:
test_pxf=> SELECT * FROM pxf_ch; id ---- 1 (1 row)
-
Create a Yandex Managed Service for MySQL® cluster with the following settings:
- User name:
mysqluser
. - In the host settings, select the Public access option.
- User name:
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
(Optional) Create an external JDBC data source with the following parameters:
- Name:
mysqlserver
- Driver:
com.mysql.jdbc.Driver
- Url:
jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1
- User:
mysqluser
You can get the cluster ID with a list of clusters in the folder.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Connect to the MySQL® database using
mysql
. -
Create a test table and populate it with data:
CREATE TABLE test (a INT, b INT);
INSERT INTO test VALUES (1, '11'), (2, '22');
-
Create an external table named
pxf_mysql
to reference thetest
table in the MySQL® cluster. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int) LOCATION ('pxf://test?PROFILE=JDBC&SERVER=mysqlserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_mysql(a int, b int) LOCATION ('pxf://test?PROFILE=JDBC&JDBC_DRIVER=com.mysql.jdbc.Driver&DB_URL=jdbc:mysql://c-<cluster_ID>.rw.mdb.yandexcloud.net:3306/db1&USER=mysqluser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_mysql;
Result:
test_pxf=> SELECT * FROM pxf_mysql; a | b --+---- 1 | 11 2 | 22 (2 rows)
-
Create a Yandex Managed Service for PostgreSQL cluster with the following settings:
- User name:
pguser
. - In the host settings, select the Public access option.
- User name:
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
(Optional) Create an external JDBC data source with the following parameters:
- Name:
pgserver
- Driver:
org.postgresql.Driver
- Url:
jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1
- User:
pguser
You can get the cluster ID with a list of clusters in the folder.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Connect to the PostgreSQL database using
psql
. -
Create a test table and populate it with data:
CREATE TABLE public.test ("a" INT,"b" INT);
INSERT INTO public.test VALUES (1, '11'), (2, '22');
-
Create an external table named
pxf_pg
to reference thepublic.test
table in the PostgreSQL cluster. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int) LOCATION ('pxf://public.test?PROFILE=JDBC&SERVER=pgserver') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_pg(a int, b int) LOCATION ('pxf://public.test?PROFILE=JDBC&JDBC_DRIVER=org.postgresql.Driver&DB_URL=jdbc:postgresql://c-<cluster_ID>.rw.mdb.yandexcloud.net:6432/db1&USER=pguser') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-
-
Query the external table for data:
SELECT * FROM pxf_pg;
Result:
test_pxf=> SELECT * FROM pxf_pg; a | b --+---- 1 | 11 2 | 22 (2 rows)
-
In the cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.
-
Create an Object Storage bucket with restricted access.
-
(Optional) Create an external S3 data source with the following parameters:
- Name:
objserver
. - Access Key: ID of the static access key you previously created.
- Secret Key: Secret key you previously created along with the static access key.
- Endpoint:
storage.yandexcloud.net
.
With no data source created, you need to provide the source connection parameters in the SQL query for creating an external table.
- Name:
-
Create a test file named
test.csv
on the local machine:1,111 2,222
-
Upload your test file to the bucket.
Files you are uploading to the bucket may not start with
.
and_
. Such files are considered hidden, and PXF does not read data from them. -
To read data from the Object Storage bucket:
-
Create an external table named
pxf_s3_read
to reference the bucket. The SQL query depends on whether you previously created an external data source or not:-
With a data source created:
CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int) LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&SERVER=objserver') FORMAT 'CSV';
-
With no data source created:
CREATE READABLE EXTERNAL TABLE pxf_s3_read(a int, b int) LOCATION ('pxf://<bucket_name>/test.csv?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net') FORMAT 'CSV';
-
-
Query the external table for data:
SELECT * FROM pxf_s3_read;
Result:
test_pxf=> SELECT * FROM pxf_s3_read; a | b ---+---- 1 | 111 2 | 222 (2 rows)
-
-
To write data to a Object Storage bucket:
-
Create an external table named
pxf_s3_write
with theWRITABLE
option. You do not need to specify a file name when creating the table:-
With a data source created:
CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int) LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&SERVER=objserver') FORMAT 'CSV';
-
With no data source created:
CREATE WRITABLE EXTERNAL TABLE pxf_s3_write(a int, b int) LOCATION ('pxf://<bucket_name>/?PROFILE=s3:text&accesskey=<key_ID>&secretkey=<secret_key>&endpoint=storage.yandexcloud.net') FORMAT 'CSV';
-
-
Add data to the table:
INSERT INTO pxf_s3_write VALUES (3,333);
INSERT 0 1
-
Make sure that a new object has been created in the bucket.
-
Note
To create an external table from Object Storage, you can use the S3 protocol and transmit your static key parameters using a file located on a web server. For more information, see this tutorial.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.
ClickHouse® is a registered trademark of ClickHouse, Inc