Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Tutorials
    • All tutorials
      • Analyzing data with Query
      • Working with data in Object Storage
      • Working with data in Managed Service for ClickHouse®
      • Working with data in Managed Service for PostgreSQL
      • Federated data queries

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account for the DataSphere project
  • Add the service account to a project
  • Create an authorized key for a service account
  • Create a secret
  • Create a notebook
  • Get started in Query
  • Connect to the Object Storage data
  • Create a data bucket
  • Create a connection to Object Storage
  • Create a data binding
  • Connect to the Managed Service for ClickHouse® data
  • Create a Managed Service for ClickHouse® cluster
  • Create a table
  • Create a connection to Managed Service for ClickHouse®
  • Connect to the Managed Service for PostgreSQL data
  • Create a Managed Service for PostgreSQL cluster
  • Create a table
  • Create a connection to Managed Service for PostgreSQL
  • Run a federated query
  • How to delete the resources you created
  1. Machine learning and artificial intelligence
  2. Analytics with DataSphere
  3. Federated data queries

Federated data queries

Written by
Yandex Cloud
Updated at May 7, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account for the DataSphere project
    • Add the service account to a project
    • Create an authorized key for a service account
    • Create a secret
    • Create a notebook
  • Get started in Query
  • Connect to the Object Storage data
    • Create a data bucket
    • Create a connection to Object Storage
    • Create a data binding
  • Connect to the Managed Service for ClickHouse® data
    • Create a Managed Service for ClickHouse® cluster
    • Create a table
    • Create a connection to Managed Service for ClickHouse®
  • Connect to the Managed Service for PostgreSQL data
    • Create a Managed Service for PostgreSQL cluster
    • Create a table
    • Create a connection to Managed Service for PostgreSQL
  • Run a federated query
  • How to delete the resources you created

Yandex Query is an interactive service for serverless data analysis. You can use it to process information from various storages without having to to create a dedicated cluster. The service supports Yandex Object Storage, Yandex Managed Service for PostgreSQL, and Yandex Managed Service for ClickHouse® data storages.

You can process data from these systems either individually or as part of one common query, referred to as a federated query.

In this tutorial, you will create three dedicated data storages: customers, purchased items, and purchase dates. Using a federated query from a notebook cell, you can get data from all storages at the same time.

  1. Set up your infrastructure.
  2. Get started in Query.
  3. Connect to the Object Storage data.
  4. Connect to the Managed Service for ClickHouse® data.
  5. Connect to the Managed Service for PostgreSQL data.
  6. Run a federated query.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The cost of supporting the infrastructure for running federated queries in this tutorial includes:

  • Fee for using DataSphere computing resources.
  • Fee for storing data in a bucket.
  • Fee for a running Managed Service for ClickHouse® cluster.
  • Fee for a running Managed Service for PostgreSQL cluster.
  • Fee for the amount of read data when executing Query queries.

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page.

Note

If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.

Create a folderCreate a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Name your folder, e.g., data-folder.
  3. Click Create.

Create a service account for the DataSphere projectCreate a service account for the DataSphere project

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Enter a name for the service account, e.g., yq-sa.
  5. Click Add role and assign the following roles to the service account:
    • yq.editor: To run queries using Query.
    • storage.viewer: To view the contents of the Object Storage bucket and objects.
    • managed-clickhouse.viewer: To view the contents of the Managed Service for ClickHouse® cluster.
    • managed-postgresql.viewer: To view the contents of the Managed Service for PostgreSQL cluster.
  6. Click Create.

Add the service account to a projectAdd the service account to a project

To enable the service account to run a DataSphere project, add it to the list of project members.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. In the Members tab, click Add member.
  3. Select the yq-sa account and click Add.
  4. Switch your service account role to Editor.

Create an authorized key for a service accountCreate an authorized key for a service account

To allow the service account to send queries using Query, create an authorized key.

Note

Authorized keys do not expire, but you can always get new authorized keys and get authenticated again if something goes wrong.

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. From the list that opens, select the yq-sa service account.
  5. Click Create new key in the top panel and select Create authorized key.
  6. Select the encryption algorithm and click Create.
  7. Click Download file with keys.

Create a secretCreate a secret

To get an authorized key from the notebook, create a secret with the contents of the authorized key file.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Under Project resources, click Secret.
  3. Click Create.
  4. In the Name field, enter a name for the secret: yq_access_key.
  5. In the Value field, paste the full contents of the authorized key file you downloaded.
  6. Click Create.

Create a notebookCreate a notebook

Queries to the Managed Service for PostgreSQL database through Query will be sent from the notebook.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Click Open project in JupyterLab and wait until loading is complete.
  3. In the top panel, click File and select New ⟶ Notebook.
  4. Select a kernel and click Select.

Get started in QueryGet started in Query

The yandex_query_magic package provides magic commands for working in Jupyter. Install it to send queries to Query. Paste the code into the yq-storage.ipynb notebook cells.

  1. Open the DataSphere project:

    1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

    2. Click Open project in JupyterLab and wait for the loading to complete.
    3. Open the notebook tab.
  2. Install the yandex_query_magic package:

    %pip install yandex_query_magic
    
  3. Once the installation is complete, from the top panel, select Kernel ⟶ Restart kernel....

  4. Upload the extension:

    %load_ext yandex_query_magic
    
  5. Configure the connection by specifying the data-folder ID and the name of the authorized key secret:

    %yq_settings --folder-id <folder_ID> --env-auth yq_access_key
    
  6. Run a test query to Query:

    %yq select "Hello, world!"
    

Connect to the Object Storage dataConnect to the Object Storage data

For your operations with the Object Storage data, you will need a bucket with a table, a Query connection, and a data binding.

Create a data bucketCreate a data bucket

The Object Storage bucket will contain the purchase dates.

Management console
  1. In the management console, navigate to data-folder.

  2. From the list of services, select Object Storage.

  3. At the top right, click Create bucket.

  4. In the ** Name** field, enter a name for the bucket.

  5. In the Object read access, Object listing access, and Read access to settings fields, select Restricted.

  6. Click Create bucket.

  7. Create a file named visits.csv and add a table with purchase dates to it:

    date|persons_id|item_id
    2024-05-14 12:12:00|1|2
    2024-05-15 13:13:00|2|1
    
  8. Navigate to the bucket you created and click Upload.

  9. In the window that opens, select visits.csv and click Open.

  10. Click Upload.

Create a connection to Object StorageCreate a connection to Object Storage

Management console
  1. In the management console, navigate to data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., storage-connection.

  6. Select the Object Storage connection type and specify the Connection type parameters.

  7. In the Bucket auth field, select Private and set these properties:

    • Cloud and Folder: data-folder.
    • Bucket: Select the bucket you created.
    • Service account: yq-sa.
  8. Click Create.

Create a data bindingCreate a data binding

A data binding contains information about file formats and locations in the bucket, as well as a list of data fields and their types.

To create a data binding:

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Bindings.

  4. Click Create.

  5. Under Connection parameters:

    • Type: Select Object Storage.
    • Connection: Select storage-connection.
  6. Under Binding parameters:

    1. Enter a name for the binding, e.g., visits.
    2. In the Path field, specify the path to the table in the bucket: /visits.csv.
  7. Under Format settings:

    • Format: Select csv_with_names.
    • Delimiter {%secondary%}(optional){%secondary%}: |.
  8. Under Columns, add three columns with the following names and data types:

    • date: DATETIME
    • person_id: INT32
    • items_id: INT32
  9. To verify the data you specified is correct, click Preview. Your table should appear below.

  10. Click Create.

To check the connection, get the table data from the notebook cell:

%yq SELECT * FROM visits LIMIT 100;

Connect to the Managed Service for ClickHouse® dataConnect to the Managed Service for ClickHouse® data

For your operations with the Managed Service for ClickHouse® data, you will need a cluster with a table and a Query connection.

Create a Managed Service for ClickHouse® clusterCreate a Managed Service for ClickHouse® cluster

Any running Managed Service for ClickHouse® cluster with the Yandex Query access option enabled is suitable for sending queries.

Management console
  1. In the management console, select data-folder.

  2. Select Managed Service for ClickHouse.

  3. Click Create cluster.

  4. In the Cluster name field, enter the cluster name, e.g., clickhouse.

  5. Under DBMS settings:

    • In the User management via SQL field, select Enabled from the drop-down list.
    • Specify Username and Password.
  6. Under Service settings:

    • Select the yq-sa service account.
    • Enable the Yandex Query access and Access from the management console options.
  7. You can leave the other settings at their defaults.

  8. Click Create cluster.

Create a tableCreate a table

The Managed Service for ClickHouse® table will contain the item names.

Management console
  1. In the management console, open the clickhouse cluster page and navigate to the SQL tab.

  2. Enter Username and Password you specified when creating the cluster.

  3. In the input window on the right, run the following SQL queries one by one:

    CREATE TABLE items(id INT, description VARCHAR) ENGINE=MergeTree ORDER BY id;
    
    INSERT INTO items(id, description) VALUES(1, 'Refrigerator')
    
    INSERT INTO items(id, description) VALUES(2, 'TV')
    
  4. Click Execute.

Create a connection to Managed Service for ClickHouse®Create a connection to Managed Service for ClickHouse®

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., clickhouse.

  6. Select the Managed Service for ClickHouse connection type.

  7. Under Connection type parameters:

    • Cluster: Select the clickhouse cluster you created earlier.
    • Service account: Select the yq-sa service account.
    • Enter Login and Password you specified when creating the cluster.
  8. Click Create.

To check the connection, run a query in the notebook cell. Here is an example:

%yq SELECT * FROM clickhouse.items

Connect to the Managed Service for PostgreSQL dataConnect to the Managed Service for PostgreSQL data

For your operations with the Managed Service for PostgreSQL data, you will need a cluster with a table and a Query connection.

Create a Managed Service for PostgreSQL clusterCreate a Managed Service for PostgreSQL cluster

Any running Managed Service for PostgreSQL cluster with the Yandex Query access option enabled is suitable for sending queries.

Management console
  1. In the management console, select data-folder.
  2. Select Managed Service for PostgreSQL.
  3. Click Create cluster.
  4. In the Cluster name field, enter the cluster name, e.g., postgresql.
  5. Under Database:
    • Specify DB name, e.g., db1.
    • Specify Username and Password.
  6. Under Service settings, enable Yandex Query access and Access from the management console.
  7. You can leave the other settings at their defaults.
  8. Click Create cluster.

Create a tableCreate a table

The Managed Service for PostgreSQL table will contain the names of the customers.

Management console
  1. In the management console, open the postgresql cluster page and navigate to the SQL tab.

  2. Enter Username and Password you specified when creating the cluster.

  3. In the input window on the right, run the following SQL queries one by one:

    CREATE TABLE persons(person_id INT, name VARCHAR);
    
    INSERT INTO persons(person_id, name) values(1, 'Ivan')
    
    INSERT INTO persons(person_id, name) values(2, 'Peter')
    

Create a connection to Managed Service for PostgreSQLCreate a connection to Managed Service for PostgreSQL

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., postgresql.

  6. Select the Managed Service for PostgreSQL connection type.

  7. Under Connection type parameters:

    • Cluster: Select the postgresql cluster you created earlier.
    • Service account: yq-sa.
    • Database: db1.
    • Enter Login and Password you specified when creating the cluster.
  8. Click Create.

To check the connection, run a query in the notebook cell. Here is an example:

%yq SELECT * FROM postgresql.persons

Run a federated queryRun a federated query

Operations with federated data sources (cross-service analytics) are no different than those with regular data sources. You can simultaneously access external data sources, such as tables in the DB and data in Object Storage, from a query and perform any allowed YQL operations on them.

To combine and get data from all the three tables, run a federated query in a notebook cell:

%%yq SELECT
    v.person_id,
    date,
    p.name,
    c.description
FROM visits AS v
    INNER JOIN postgresql.persons AS p ON
        v.person_id = p.person_id
    INNER JOIN clickhouse.items AS c ON
        v.item_id = c.id

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  • Delete the bucket.
  • Delete the Managed Service for ClickHouse® cluster.
  • Delete the Managed Service for PostgreSQL cluster.
  • Delete the project.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Working with data in Managed Service for PostgreSQL
Next
Recognizing text in image archives in Vision OCR
© 2025 Direct Cursus Technology L.L.C.