Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • AI for business
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex MPP Analytics for PostgreSQL
  • Getting started
    • All tutorials
    • Migrating a database to Yandex MPP Analytics for PostgreSQL
    • Creating an external table from an Object Storage bucket table using a configuration file
    • Getting data from external sources using named queries
    • Migrating a database from Greenplum® to PostgreSQL
    • Migrating a database from Greenplum® to ClickHouse®
    • Migrating a database from MySQL® to Greenplum®
    • Exporting Greenplum® data to a cold storage in Object Storage
    • Loading data from Object Storage to Yandex MPP Analytics for PostgreSQL using Data Transfer
    • Copying data from Managed Service for OpenSearch to Yandex MPP Analytics for PostgreSQL using Yandex Data Transfer
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Public materials
  • Release notes

In this article:

  • Required paid resources
  • Getting started
  • Prepare the test data
  • Create an external data source
  • Create a named query
  • Create an external table and get data from it
  • Delete the resources you created
  1. Tutorials
  2. Getting data from external sources using named queries

Getting data from external sources using named queries

Written by
Yandex Cloud
Updated at October 30, 2025
  • Required paid resources
  • Getting started
  • Prepare the test data
  • Create an external data source
  • Create a named query
  • Create an external table and get data from it
  • Delete the resources you created

You can use named queries to retrieve data from external systems through PXF in Yandex MPP Analytics for PostgreSQL.

A named query is a prebuilt SQL query stored in the mdb_toolkit.pxf_named_queries table of the Yandex MPP Analytics for PostgreSQL cluster's system database. You specify the query name reference when creating an external table.

Named queries allow you to join tables and aggregate data in an external source when creating a view is not an option. With all computations performed externally, the cluster operates more efficiently.

Named queries can be used with data sources connected to a Yandex MPP Analytics for PostgreSQL cluster through a JDBC connector.

To get data from an external source using a named query:

  1. Prepare the test data.
  2. Create an external data source.
  3. Create a named query.
  4. Create an external table and get data from it.

If you no longer need the resources you created, delete them.

Required paid resourcesRequired paid resources

The support cost for this solution includes:

  • Fee for a Yandex MPP Analytics for PostgreSQL cluster, specifically computing resources allocated to cluster hosts, as well as the size of storage and backups (see Yandex MPP Analytics for PostgreSQL pricing).
  • Fee for a Yandex Managed Service for PostgreSQL cluster, specifically computing resources allocated to cluster hosts, as well as the size of storage and backups (see Managed Service for PostgreSQL pricing).
  • Fee for hourly usage of NAT gateways and outgoing traffic they handle (see Yandex Virtual Private Cloud pricing).
  • Fee for using public IP addresses (see Virtual Private Cloud pricing).

Getting startedGetting started

Set up the infrastructure:

Manually
Terraform
  1. Create a Yandex MPP Analytics for PostgreSQL cluster with any suitable configuration.

  2. In the Yandex MPP Analytics for PostgreSQL cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.

  3. Create a Managed Service for PostgreSQL cluster with publicly available hosts.

  4. In the Managed Service for PostgreSQL cluster subnet, set up a NAT gateway and create a security group allowing all incoming and outgoing traffic from all addresses.

  1. If you do not have Terraform yet, install it.

  2. Get the authentication credentials. You can add them to environment variables or specify them later in the provider configuration file.

  3. Configure and initialize a provider. There is no need to create a provider configuration file manually, you can download it.

  4. Place the configuration file in a separate working directory and specify the parameter values. If you did not add the authentication credentials to environment variables, specify them in the configuration file.

  5. Download the pxf-named-queries-infrastructure.tf file to the working directory. This file describes:

    • Networks.
    • Subnets.
    • NAT gateways.
    • Security groups.
    • Yandex MPP Analytics for PostgreSQL cluster.
    • Managed Service for PostgreSQL cluster.
  6. Specify the following in the file:

    • mgp_password: Greenplum® user password.
    • mgp_version: Greenplum® version.
    • mpg_password: PostgreSQL database user password.
    • mpg_version: PostgreSQL version.
  7. To verify that the config files are correct, run the command below:

    terraform validate
    

    If there are any errors in the configuration files, Terraform will point them out.

  8. Create an infrastructure:

    1. Run this command to view the planned changes:

      terraform plan
      

      If you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.

    2. If everything looks correct, apply the changes:

      1. Run this command:

        terraform apply
        
      2. Confirm updating the resources.

      3. Wait for the operation to complete.

    All the required resources will be created in the specified folder. You can check resource availability and their settings in the management console.

Prepare the test dataPrepare the test data

  1. Connect to the PostgreSQL database.

  2. Create a table named customers and populate it with test data:

    CREATE TABLE customers(id int, name text, city text);
    INSERT INTO customers VALUES (111, 'Benjamin', 'Chicago');
    INSERT INTO customers VALUES (222, 'Mary', 'New York');
    INSERT INTO customers VALUES (333, 'Paul', 'Boston');
    INSERT INTO customers VALUES (444, 'Peter', 'Denver');
    INSERT INTO customers VALUES (555, 'Natalie', 'Austin');
    
  3. Check the result:

    SELECT * FROM customers;
    
    id  |  name   |    city
    ----+---------+-------------
    111 | Benjamin   | Chicago
    222 | Mary   | New York
    333 | Paul   | Boston
    444 | Peter    | Denver
    555 | Natalie | Austin
    
  4. Create a table named orders and populate it with test data:

    CREATE TABLE orders(customer_id int, amount int, year int);
    INSERT INTO orders VALUES (111, 12, 2018);
    INSERT INTO orders VALUES (222, 234, 2019);
    INSERT INTO orders VALUES (333, 34, 2018);
    INSERT INTO orders VALUES (444, 456, 2019);
    INSERT INTO orders VALUES (555, 56, 2021);
    
  5. Check the result:

    SELECT * FROM orders;
    
    customer_id | amount | year
    ------------+--------+------
            111 |     12 | 2018
            222 |    234 | 2019
            333 |     34 | 2018
            444 |    456 | 2019
            555 |     56 | 2021
    

Create an external data sourceCreate an external data source

In the Yandex MPP Analytics for PostgreSQL cluster, create an external data source with the following settings:

  • Name: pgserver.
  • Driver: org.postgresql.Driver.
  • Url: jdbc:postgresql://c-<PostgreSQL_cluster_ID>.rw.mdb.yandexcloud.net:6432/<DB_name_in_PostgreSQL_cluster>.
  • User: <PostgreSQL_username>.

You can get the cluster ID with the list of clusters in the folder.

Create a named queryCreate a named query

To create a named query, add it to the mdb_toolkit.pxf_named_queries table:

  1. Connect to Greenplum®.

  2. Run this query:

    INSERT INTO mdb_toolkit.pxf_named_queries (pxf_profile, name, query) VALUES (
       'pgserver',
       'my_query',
       'SELECT c.name, o.year
       FROM customers c
       JOIN orders o ON c.id = o.customer_id;'
    );
    

    Where:

    • pgserver: Data source name.
    • my_query: Named query name.

    As an example of a named query, we will use one that joins the customers and orders tables by the id = customer_id field and returns the customer name (c.name) along with the year of their orders (o.year).

  3. Check the result:

    SELECT * FROM mdb_toolkit.pxf_named_queries;
    
    pxf_profile |    name     |              query
    ------------+-------------+--------------------------------------------
    pgserver    | my_query    | SELECT c.name, o.year                       
                |             |       FROM customers c                      
                |             |       JOIN orders o ON c.id = o.customer_id;
    

    The query column should contain the text of the named query.

Create an external table and get data from itCreate an external table and get data from it

  1. Connect to Greenplum®.

  2. Create an external table named pxf_named_query. It will reference the data the named query retrieves from the customers and orders tables in the PostgreSQL DB:

    CREATE READABLE EXTERNAL TABLE pxf_named_query(name text, year int)
    LOCATION ('pxf://query:my_query?PROFILE=JDBC&SERVER=pgserver')
    FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
    

    Learn more about the SQL syntax for creating external tables here.

  3. Get the data:

    SELECT * FROM pxf_named_query;
    

    Result:

      name  | year
    --------+------
    Benjamin   | 2018
    Mary   | 2019
    Paul   | 2018
    Peter    | 2019
    Natalie | 2021
    

    Tip

    If the query fails and returns an error, wait a few minutes and try again. The changes may not have been applied yet.

Delete the resources you createdDelete the resources you created

Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them:

Manually
Terraform
  1. Delete the Yandex MPP Analytics for PostgreSQL cluster.
  2. Delete the Managed Service for PostgreSQL cluster.
  3. Delete the NAT gateways.
  1. In the terminal window, go to the directory containing the infrastructure plan.

    Warning

    Make sure the directory has no Terraform manifests with the resources you want to keep. Terraform deletes all resources that were created using the manifests in the current directory.

  2. Delete resources:

    1. Run this command:

      terraform destroy
      
    2. Confirm deleting the resources and wait for the operation to complete.

    All the resources described in the Terraform manifests will be deleted.

Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.

Was the article helpful?

Previous
Creating an external table from an Object Storage bucket table using a configuration file
Next
Migrating a database from Greenplum® to PostgreSQL
© 2025 Direct Cursus Technology L.L.C.