Creating an external S3 data source
In Yandex MPP Analytics for PostgreSQL, you can use Yandex Object Storage or other third-party S3 services as an external data source with the S3 connection type.
To get started, create a static access key. You will need to specify its data in the source properties.
Create an external data source
To create an external S3 data source:
-
Open the folder dashboard
. -
Navigate to Yandex MPP Analytics for PostgreSQL.
-
Open the page of the Greenplum® cluster in question.
-
In the left-hand panel, select
PXF. -
Click Create data source.
-
Select the
S3connection type. -
Enter a source name.
-
Configure at least one optional setting:
-
Specify the static access key ID in the Access Key field, and its contents, in the Secret Key field.
-
Select Fast Upload to enable fast upload of large files to S3 storage.
This option is enabled by default.
When using fast upload, PXF generates files in RAM (if out of RAM, it writes them to disk). If fast upload is disabled, PXF generates files on disk.
-
In the Endpoint field, enter the S3 storage address.
The default value is
storage.yandexcloud.netfor Object Storage.
-
-
Click Create.
If you do not have the Yandex Cloud CLI yet, install and initialize it.
The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also specify a different folder for any command using --folder-name or --folder-id. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.
To create an external S3 data source:
-
View the description of the CLI command for creating a data source:
yc managed-greenplum pxf-datasource create s3 --help -
Configure the data source:
yc managed-greenplum pxf-datasource create s3 <external_data_source_name> \ --cluster-id=<cluster_ID> \ --access-key=<static_key_ID> \ --secret-key=<secret_part_of_static_key> \ --endpoint=<S3_storage_address> \ --fast-upload=<fast_upload>Where:
cluster-id: Cluster ID. You can get it with the list of clusters in the folder.access-key,secret-key: ID and contents of the static access key.endpoint: S3 storage address. The value for Object Storage isstorage.yandexcloud.net. This is the default value.fast-upload: Fast upload of large files to S3 storage. The possible values are:true: Default value. PXF generates files in RAM (if out of RAM, it writes them to disk).false: PXF generates files on disk.
-
Get an IAM token for API authentication and put it into an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the PXFDatasource.Create method, e.g., via the following cURL
request:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources' \ --data '{ "datasource": { "name": "<external_data_source_name>", "s3": { "accessKey": "<static_key_ID>", "secretKey": "<secret_part_of_static_key>", "fastUpload": "<fast_upload>", "endpoint": "<S3_storage_address>" } } }'Where:
-
name: External data source name. -
s3: External data source settings:-
accessKey,secretKey: ID and contents of the static access key. -
fastUpload: Fast upload of large files to S3 storage. The possible values are:true: Default value. PXF generates files in RAM (if out of RAM, it writes them to disk).false: PXF generates files on disk.
-
endpoint: S3 storage address. The value for Object Storage isstorage.yandexcloud.net. This is the default value.
-
You can get the cluster ID with the list of clusters in the folder.
-
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and put it into an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume that the repository contents reside in the
~/cloudapi/directory. -
Call the PXFDatasourceService.Create method, e.g., via the following gRPCurl
request:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/greenplum/v1/pxf_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" "datasource": { "name": "<external_data_source_name>", "s3": { "access_key": "<static_key_ID>", "secret_key": "<secret_part_of_static_key>", "fast_upload": <fast_upload>, "endpoint": "<S3_storage_address>" } } }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.greenplum.v1.PXFDatasourceService.CreateWhere:
-
name: External data source name. -
s3: External data source settings:-
access_key,secret_key: ID and contents of the static access key. -
fast_upload: Fast upload of large files to S3 storage. The possible values are:true: Default value. PXF generates files in RAM (if out of RAM, it writes them to disk).false: PXF generates files on disk.
-
endpoint: S3 storage address. The value for Object Storage isstorage.yandexcloud.net. This is the default value.
-
You can get the cluster ID with the list of clusters in the folder.
-
-
Check the server response to make sure your request was successful.
After you create an external data source, create an external table.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.