Creating an external HDFS data source
In Managed Service for Greenplum®, as an external data source with the HDFS connection type, you can use HDFS as part of Yandex Data Processing or other third-party HDFS services.
Create an external data source
- Go to the folder page
and select Managed Service for Greenplum. - Open the page of the Managed Service for Greenplum® cluster you need.
- In the left-hand panel, select
PXF. - Click Create data source.
- Select the
HDFS
connection type. - Enter a source name.
- Configure at least one optional setting.
- Click Create.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the PXFDatasource.Create method and make a request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources' \ --data '{ "datasource": { "name": "<external_data_source_name>", "hdfs": { "core": { "defaultFs": "<storage_type>" }, ... } } }'
Where:
name
: External data source name.hdfs
: External data source settings. Configure at least one optional setting.
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the PXFDatasourceService.Create call and make a request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/greenplum/v1/pxf_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" "datasource": { "name": "<external_data_source_name>", "hdfs": { "core": { "default_fs": "<storage_type>" }, ... } } }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.greenplum.v1.PXFDatasourceService.Create
Where:
name
: External data source name.hdfs
: External data source settings. Configure at least one optional setting.
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.