Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex MPP Analytics for PostgreSQL
  • Getting started
    • All guides
      • Overview
        • S3
        • JDBC
        • HDFS
        • Hive
      • Creating an external table
      • Editing PXF settings
    • Connecting to an external file server (gpfdist)
    • Auxiliary utilities
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes
  1. Step-by-step guides
  2. Working with PXF
  3. Creating external data sources
  4. S3

Creating an external S3 data source

Written by
Yandex Cloud
Updated at November 1, 2025

In Yandex MPP Analytics for PostgreSQL, you can use Yandex Object Storage or other third-party S3 services as an external data source with the S3 connection type.

To get started, create a static access key. You will need to specify its data in the source parameters.

Create an external data sourceCreate an external data source

Management console
CLI
REST API
gRPC API

To create an external S3 data source:

  1. Navigate to the folder dashboard and select Yandex MPP Analytics for PostgreSQL.
  2. Open the page of the Yandex MPP Analytics for PostgreSQL cluster you need.
  3. In the left-hand panel, select  PXF.
  4. Click Create data source.
  5. Select the S3 connection type.
  6. Enter a source name.
  7. Configure at least one optional setting.
  8. Click Create.

If you do not have the Yandex Cloud CLI installed yet, install and initialize it.

By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.

To create an external S3 data source:

  1. View the description of the CLI command to create a data source:

    yc managed-greenplum pxf-datasource create s3 --help
    
  2. Configure the data source:

    yc managed-greenplum pxf-datasource create s3 <external_data_source_name> \
       --cluster-id=<cluster_ID> \
       --access-key=<static_key_ID> \
       --secret-key=<secret_part_of_static_key> \
       --endpoint=<S3_storage_address> \
       --fast-upload=<fast_upload>
    

    Where:

    • cluster-id: Cluster ID. You can get it with the list of clusters in the folder.
    • access-key, secret-key: ID and contents of the static access key.
    • endpoint: S3 storage address. Object Storage is set to storage.yandexcloud.net. This is a default value.
    • fast-upload: Fast upload of large files to S3 storage. The possible values are:
      • true (default): PXF generates files on the disk before sending them to S3 storage.
      • false: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
  1. Get an IAM token for API authentication and put it into the environment variable:

    export IAM_TOKEN="<IAM_token>"
    
  2. Use the PXFDatasource.Create method and send the following request, e.g., via cURL:

    curl \
        --request POST \
        --header "Authorization: Bearer $IAM_TOKEN" \
        --header "Content-Type: application/json" \
        --url 'https://mdb.api.cloud.yandex.net/managed-greenplum/v1/clusters/<cluster_ID>/pxf_datasources' \
        --data '{
                  "datasource": {
                    "name": "<external_data_source_name>",
                    "s3": {
                      "accessKey": "<static_key_ID>",
                      "secretKey": "<secret_part_of_static_key>",
                      "fastUpload": "<fast_upload>",
                      "endpoint": "<S3_storage_address>"
                    }
                  }
                }'
    

    Where:

    • name: External data source name.

    • s3: External data source settings:

      • accessKey, secretKey: ID and contents of the static access key.

      • fastUpload: Fast upload of large files to S3 storage. The possible values are:

        • true (default): PXF generates files on the disk before sending them to S3 storage.
        • false: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
      • endpoint: S3 storage address. Object Storage is set to storage.yandexcloud.net. This is a default value.

    You can request the cluster ID with the list of clusters in the folder.

  3. View the server response to make sure the request was successful.

  1. Get an IAM token for API authentication and put it into the environment variable:

    export IAM_TOKEN="<IAM_token>"
    
  2. Clone the cloudapi repository:

    cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
    

    Below, we assume the repository contents are stored in the ~/cloudapi/ directory.

  3. Use the PXFDatasourceService.Create call and send the following request, e.g., via gRPCurl:

    grpcurl \
        -format json \
        -import-path ~/cloudapi/ \
        -import-path ~/cloudapi/third_party/googleapis/ \
        -proto ~/cloudapi/yandex/cloud/mdb/greenplum/v1/pxf_service.proto \
        -rpc-header "Authorization: Bearer $IAM_TOKEN" \
        -d '{
              "cluster_id": "<cluster_ID>"
              "datasource": {
                "name": "<external_data_source_name>",
                "s3": {
                  "access_key": "<static_key_ID>",
                  "secret_key": "<secret_part_of_static_key>",
                  "fast_upload": <fast_upload>,
                  "endpoint": "<S3_storage_address>"
                }
              }
            }' \
        mdb.api.cloud.yandex.net:443 \
        yandex.cloud.mdb.greenplum.v1.PXFDatasourceService.Create
    

    Where:

    • name: External data source name.

    • s3: External data source settings:

      • access_key, secret_key: ID and contents of the static access key.

      • fast_upload: Fast upload of large files to S3 storage. The possible values are:

        • true (default): PXF generates files on the disk before sending them to S3 storage.
        • false: PXF generates files in RAM (if RAM capacity is reached, it writes them to disk).
      • endpoint: S3 storage address. Object Storage is set to storage.yandexcloud.net. This is a default value.

    You can request the cluster ID with the list of clusters in the folder.

  4. View the server response to make sure the request was successful.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.

Was the article helpful?

Previous
Overview
Next
JDBC
© 2025 Direct Cursus Technology L.L.C.