Creating an external HDFS data source
In Managed Service for Greenplum®, as an external data source with the HDFS connection type, you can use HDFS as part of Yandex Data Processing or other third-party HDFS services.
Create an external data source
- Go to the folder page
and select Managed Service for Greenplum. - Open the page of the Managed Service for Greenplum® cluster you need.
- In the left-hand panel, select
PXF. - Click Create data source.
- Select the
HDFS
connection type. - Enter a source name.
- Configure at least one optional setting.
- Click Create.
To add an HDFS data source to a Managed Service for Greenplum® cluster, use the create REST API method for the PXFDatasource resource or the PXFDatasourceService/Create gRPC API call and provide the following in the request:
- Cluster ID in the
clusterId
parameter. To find out the cluster ID, get a list of clusters in the folder. - Source name in the
name
parameter. - External source settings in the
hdfs
parameter.
Sample REST API request
The example below shows how to create an external HDFS data source using the Managed Service for Greenplum® REST API. To create a source:
-
Get an IAM token. It is used for authentication in the API.
-
Add the IAM token to the following environment variable:
export IAM_TOKEN=<token>
-
Send a request using cURL
:curl --location "https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources" \ --header "Content-Type: text/plain" \ --header "Authorization: Bearer ${IAM_TOKEN}" \ --data "{ \"datasource\": { \"name\": \"hdfs:csv\", \"hdfs\": { \"core\": { \"defaultFs\": \"<storage_type:_DISK_or_ARCHIVE>\" } } } }"
In the request body, specify the following parameters:
-
name
: Source name, e.g.,hdfs:csv
. -
defaultFs
: Default data storage type (optional). The possible values include:DISK
: Data storage on a physical disk.ARCHIVE
: Archival data storage. In this case, you can store more data in HDFS, but their processing speed will be lower.
-
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.