Creating an external S3 data source
In Managed Service for Greenplum®, you can use Yandex Object Storage or other third-party S3 services as an external data source with the S3 connection type.
To get started, create a static access key. You will need to specify its data in the source parameters.
Create an external data source
To create an external S3 data source:
- Go to the folder page
and select Managed Service for Greenplum. - Open the page of the Managed Service for Greenplum® cluster you need.
- In the left-hand panel, select
PXF. - Click Create data source.
- Select the
S3
connection type. - Enter a source name.
- Configure at least one optional setting.
- Click Create.
If you do not have the Yandex Cloud CLI yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder through the --folder-name
or --folder-id
parameter.
To create an external S3 data source:
-
View the description of the CLI command to create a data source:
yc managed-greenplum pxf-datasource create s3 --help
-
Configure the data source:
yc managed-greenplum pxf-datasource create s3 <external_data_source_name> \ --cluster-id=<cluster_ID> \ --access-key=<static_key_ID> \ --secret-key=<secret_part_of_static_key> \ --endpoint=<S3_storage_address> \ --fast-upload=<fast_upload>
Where:
cluster-id
: Cluster ID. You can get it with a list of clusters in the folder.access-key
,secret-key
: ID and contents of the static access key.endpoint
: S3 storage address. Object Storage is set tostorage.yandexcloud.net
. This is a default value.fast-upload
: Fast upload of large files to S3 storage. The possible values are:true
(default): PXF generates files on the disk before sending them to S3 storage.false
: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the PXFDatasource.Create method and make a request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources' \ --data '{ "datasource": { "name": "<external_data_source_name>", "s3": { "accessKey": "<static_key_ID>", "secretKey": "<secret_part_of_static_key>", "fastUpload": "<fast_upload>", "endpoint": "<S3_storage_address>" } } }'
Where:
-
name
: External data source name. -
s3
: External data source settings:-
accessKey
,secretKey
: ID and contents of the static access key. -
fastUpload
: Fast upload of large files to S3 storage. The possible values are:true
(default): PXF generates files on the disk before sending them to S3 storage.false
: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
-
endpoint
: S3 storage address. Object Storage is set tostorage.yandexcloud.net
. This is a default value.
-
You can get the cluster ID with a list of clusters in the folder.
-
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the PXFDatasourceService.Create call and make a request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/greenplum/v1/pxf_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" "datasource": { "name": "<external_data_source_name>", "s3": { "access_key": "<static_key_ID>", "secret_key": "<secret_part_of_static_key>", "fast_upload": <fast_upload>, "endpoint": "<S3_storage_address>" } } }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.greenplum.v1.PXFDatasourceService.Create
Where:
-
name
: External data source name. -
s3
: External data source settings:-
access_key
,secret_key
: ID and contents of the static access key. -
fast_upload
: Fast upload of large files to S3 storage. The possible values are:true
(default): PXF generates files on the disk before sending them to S3 storage.false
: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
-
endpoint
: S3 storage address. Object Storage is set tostorage.yandexcloud.net
. This is a default value.
-
You can get the cluster ID with a list of clusters in the folder.
-
-
View the server response to make sure the request was successful.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.