Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex MPP Analytics for PostgreSQL
  • Getting started
    • All guides
      • Overview
        • S3
        • JDBC
        • HDFS
        • Hive
      • Creating an external table
      • Editing PXF settings
    • Connecting to an external file server (gpfdist)
    • Auxiliary utilities
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes
  1. Step-by-step guides
  2. Working with PXF
  3. Creating external data sources
  4. HDFS

Creating an external HDFS data source

Written by
Yandex Cloud
Updated at January 27, 2026

In Yandex MPP Analytics for PostgreSQL, you can use HDFS as part of Yandex Data Processing or other third-party HDFS services as an external data source with the HDFS connection type.

Create an external data sourceCreate an external data source

Management console
REST API
gRPC API
  1. Open the folder dashboard.
  2. Navigate to the Yandex MPP Analytics for PostgreSQL service.
  3. Open the page of the Greenplum® cluster in question.
  4. In the left-hand panel, select  PXF.
  5. Click Create data source.
  6. Select the HDFS connection type.
  7. Enter a source name.
  8. Configure at least one optional setting.
  9. Click Create.
  1. Get an IAM token for API authentication and put it in an environment variable:

    export IAM_TOKEN="<IAM_token>"
    
  2. Use the PXFDatasource.Create method and send the following request, e.g., via cURL:

    curl \
        --request POST \
        --header "Authorization: Bearer $IAM_TOKEN" \
        --header "Content-Type: application/json" \
        --url 'https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources' \
        --data '{
                  "datasource": {
                    "name": "<external_data_source_name>",
                    "hdfs": {
                      "core": {
                        "defaultFs": "<storage_type>"
                      },
                      ...
                    }
                  }
                }'
    

    Where:

    • name: External data source name.
    • hdfs: External data source settings. Configure at least one optional setting.

    You can get the cluster ID with the list of clusters in the folder.

  3. View the server response to make sure your request was successful.

  1. Get an IAM token for API authentication and put it in an environment variable:

    export IAM_TOKEN="<IAM_token>"
    
  2. Clone the cloudapi repository:

    cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
    

    Below, we assume that the repository contents reside in the ~/cloudapi/ directory.

  3. Use the PXFDatasourceService.Create call and send the following request, e.g., via gRPCurl:

    grpcurl \
        -format json \
        -import-path ~/cloudapi/ \
        -import-path ~/cloudapi/third_party/googleapis/ \
        -proto ~/cloudapi/yandex/cloud/mdb/greenplum/v1/pxf_service.proto \
        -rpc-header "Authorization: Bearer $IAM_TOKEN" \
        -d '{
              "cluster_id": "<cluster_ID>"
              "datasource": {
                "name": "<external_data_source_name>",
                "hdfs": {
                  "core": {
                    "default_fs": "<storage_type>"
                  },
                  ...
                }
              }
            }' \
        mdb.api.cloud.yandex.net:443 \
        yandex.cloud.mdb.greenplum.v1.PXFDatasourceService.Create
    

    Where:

    • name: External data source name.
    • hdfs: External data source settings. Configure at least one optional setting.

    You can get the cluster ID with the list of clusters in the folder.

  4. View the server response to make sure your request was successful.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.

Was the article helpful?

Previous
JDBC
Next
Hive
© 2026 Direct Cursus Technology L.L.C.