Creating an external S3 data source
In Managed Service for Greenplum®, you can use Yandex Object Storage or other third-party S3 services as an external data source with the S3 connection type.
To get started, create a static access key. You will need to specify its data in the source parameters.
Create an external data source
To create an external S3 data source:
- Go to the folder page
and select Managed Service for Greenplum. - Open the page of the Managed Service for Greenplum® cluster you need.
- In the left-hand panel, select
PXF. - Click Create data source.
- Select the
S3
connection type. - Enter a source name.
- Configure at least one optional setting.
- Click Create.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create an external S3 data source:
-
View the description of the CLI command to create a data source:
yc managed-greenplum pxf-datasource create s3 --help
-
Configure the data source:
yc managed-greenplum pxf-datasource create s3 <external_data_source_name> \ --cluster-id=<cluster_ID> \ --access-key=<static_key_ID> \ --secret-key=<secret_part_of_static_key> \ --endpoint=<S3_storage_address> \ --fast-upload=<fast_upload>
Where:
cluster-id
: Cluster ID. You can get it with a list of clusters in the folder.access-key
,secret-key
: ID and contents of the static access key.endpoint
: S3 storage address. Object Storage is set tostorage.yandexcloud.net
. This is a default value.fast-upload
: Fast upload of large files to S3 storage. The possible values are:true
(default): PXF generates files on the disk before sending them to S3 storage.false
: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
To add an S3 data source to a Managed Service for Greenplum® cluster, use the create REST API method for the PXFDatasource resource or the PXFDatasourceService/Create gRPC API call and provide the following in the request:
- Cluster ID in the
clusterId
parameter. To find out the cluster ID, get a list of clusters in the folder. - Source name in the
name
parameter. - External source settings in the
s3
parameter.
Sample REST API request
The example below shows how to create an external data source for an Object Storage bucket using the Managed Service for Greenplum® REST API. To create a source:
-
Get an IAM token. It is used for authentication in the API.
-
Add the IAM token to the following environment variable:
export IAM_TOKEN=<token>
-
Create a static access key.
-
Send a request using cURL
:curl --location "https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources" \ --header "Content-Type: text/plain" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --data "{ \"datasource\": { \"name\": \"s3:csv\", \"s3\": { \"accessKey\": \"<key_ID>\", \"secretKey\": \"<secret_key>\", \"endpoint\": \"storage.yandexcloud.net\" } } }"
In the request body, specify the following parameters:
name
: Source name, e.g.,s3:csv
.accessKey
: Static access key ID.secretKey
: Secret key. It is part of the static key.endpoint
: Object Storage address.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.