Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Object Storage
    • All tutorials
    • Getting statistics on object queries with S3 Select
    • Getting website traffic statistics with S3 Select
    • Getting statistics on object queries using Yandex Query
    • Generating a resource-by-resource cost breakdown report using S3 Select
    • Server-side encryption
    • Integrating an L7 load balancer with CDN and Object Storage
    • Blue-green and canary deployment of service versions
    • Analyzing logs in DataLens
    • Mounting buckets to the file system of Yandex Data Processing hosts
    • Using Object Storage in Yandex Data Processing
    • Importing data from Object Storage, processing and exporting to Managed Service for ClickHouse®
    • Mounting a bucket as a disk in Windows
    • Migrating data from Yandex Data Streams using Yandex Data Transfer
    • Using hybrid storage in Yandex Managed Service for ClickHouse®
    • Loading data from Yandex Managed Service for OpenSearch to Yandex Object Storage using Yandex Data Transfer
    • Automatically copying objects from one bucket to another
    • Recognizing audio files in a bucket on a regular basis
    • Training a model in Yandex DataSphere on data from Object Storage
    • Connecting to Object Storage from VPC
    • Migrating data to Yandex Managed Service for PostgreSQL using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for Greenplum® using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for ClickHouse® using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for YDB using Yandex Data Transfer
    • Exchanging data between Yandex Managed Service for ClickHouse® and Yandex Data Processing
    • Uploading data from Yandex Managed Service for YDB using Yandex Data Transfer
    • Hosting a static Gatsby website in Object Storage
    • Migrating a database from Managed Service for PostgreSQL to Object Storage
    • Exchanging data between Yandex Managed Service for ClickHouse® and Yandex Data Processing
    • Importing data from Yandex Managed Service for PostgreSQL to Yandex Data Processing using Sqoop
    • Importing data from Yandex Managed Service for MySQL® to Yandex Data Processing using Sqoop
    • Migrating data from Yandex Object Storage to Yandex Managed Service for MySQL® using Yandex Data Transfer
    • Migrating a database from Yandex Managed Service for MySQL® to Yandex Object Storage
    • Exporting Greenplum® data to a cold storage in Yandex Object Storage
    • Loading data from Yandex Direct to a Yandex Managed Service for ClickHouse® data mart using Yandex Cloud Functions, Yandex Object Storage, and Yandex Data Transfer
    • Migrating data from Elasticsearch to Yandex Managed Service for OpenSearch
    • Uploading Terraform states to Object Storage
    • Locking Terraform states using Managed Service for YDB
    • Visualizing Yandex Query data
    • Publishing game updates
    • VM backups using Hystax Acura
    • Backing up to Object Storage with CloudBerry Desktop Backup
    • Backing up to Object Storage with Duplicati
    • Backing up to Object Storage with Bacula
    • Backing up to Yandex Object Storage with Veeam Backup
    • Backing up to Object Storage with Veritas Backup Exec
    • Managed Service for Kubernetes cluster backups in Object Storage
    • Developing a custom integration in API Gateway
    • URL shortener
    • Storing application runtime logs
    • Developing a skill for Alice and a website with authorization
    • Creating an interactive serverless application using WebSocket
    • Deploying a web application using the Java Servlet API
    • Developing a Telegram bot
    • Replicating logs to Object Storage using Fluent Bit
    • Replicating logs to Object Storage using Data Streams
    • Uploading audit logs to ArcSight SIEM
    • Exporting audit logs to SIEM Splunk systems
    • Creating an MLFlow server for logging experiments and artifacts
    • Operations with data using Yandex Query
    • Federated data queries using Query
    • Recognizing text in image archives using Vision OCR
    • Converting a video to a GIF in Python
    • Automating tasks using Managed Service for Apache Airflow™
    • Processing files with usage details in Yandex Cloud Billing
    • Deploying a web app with JWT authorization in API Gateway and authentication in Firebase
    • Searching for Yandex Cloud events in Yandex Query
    • Searching for Yandex Cloud events in Object Storage
    • Creating an external table from a bucket table using a configuration file
    • Migrating databases from Google BigQuery to Managed Service for ClickHouse®
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Bucket logs
  • Release notes
  • FAQ

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account for the DataSphere project
  • Add the service account to a project
  • Create an authorized key for a service account
  • Create a secret
  • Create a notebook
  • Get started in Query
  • Connect to the Object Storage data
  • Create a data bucket
  • Create a connection to Object Storage
  • Create a data binding
  • Connect to the Managed Service for ClickHouse® data
  • Create a Managed Service for ClickHouse® cluster
  • Create a table
  • Create a connection to Managed Service for ClickHouse®
  • Connect to the Managed Service for PostgreSQL data
  • Create a Managed Service for PostgreSQL cluster
  • Create a table
  • Create a connection to Managed Service for PostgreSQL
  • Run a federated query
  • How to delete the resources you created
  1. Tutorials
  2. Federated data queries using Query

Federated data queries using Yandex Query

Written by
Yandex Cloud
Updated at May 7, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account for the DataSphere project
    • Add the service account to a project
    • Create an authorized key for a service account
    • Create a secret
    • Create a notebook
  • Get started in Query
  • Connect to the Object Storage data
    • Create a data bucket
    • Create a connection to Object Storage
    • Create a data binding
  • Connect to the Managed Service for ClickHouse® data
    • Create a Managed Service for ClickHouse® cluster
    • Create a table
    • Create a connection to Managed Service for ClickHouse®
  • Connect to the Managed Service for PostgreSQL data
    • Create a Managed Service for PostgreSQL cluster
    • Create a table
    • Create a connection to Managed Service for PostgreSQL
  • Run a federated query
  • How to delete the resources you created

Yandex Query is an interactive service for serverless data analysis. You can use it to process information from various storages without having to to create a dedicated cluster. The service supports Yandex Object Storage, Yandex Managed Service for PostgreSQL, and Yandex Managed Service for ClickHouse® data storages.

You can process data from these systems either individually or as part of one common query, referred to as a federated query.

In this tutorial, you will create three dedicated data storages: customers, purchased items, and purchase dates. Using a federated query from a notebook cell, you can get data from all storages at the same time.

  1. Set up your infrastructure.
  2. Get started in Query.
  3. Connect to the Object Storage data.
  4. Connect to the Managed Service for ClickHouse® data.
  5. Connect to the Managed Service for PostgreSQL data.
  6. Run a federated query.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The cost of supporting the infrastructure for running federated queries in this tutorial includes:

  • Fee for using DataSphere computing resources.
  • Fee for storing data in a bucket.
  • Fee for a running Managed Service for ClickHouse® cluster.
  • Fee for a running Managed Service for PostgreSQL cluster.
  • Fee for the amount of read data when executing Query queries.

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page.

Note

If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.

Create a folderCreate a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Name your folder, e.g., data-folder.
  3. Click Create.

Create a service account for the DataSphere projectCreate a service account for the DataSphere project

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Enter a name for the service account, e.g., yq-sa.
  5. Click Add role and assign the following roles to the service account:
    • yq.editor: To run queries using Query.
    • storage.viewer: To view the contents of the Object Storage bucket and objects.
    • managed-clickhouse.viewer: To view the contents of the Managed Service for ClickHouse® cluster.
    • managed-postgresql.viewer: To view the contents of the Managed Service for PostgreSQL cluster.
  6. Click Create.

Add the service account to a projectAdd the service account to a project

To enable the service account to run a DataSphere project, add it to the list of project members.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. In the Members tab, click Add member.
  3. Select the yq-sa account and click Add.
  4. Switch your service account role to Editor.

Create an authorized key for a service accountCreate an authorized key for a service account

To allow the service account to send queries using Query, create an authorized key.

Note

Authorized keys do not expire, but you can always get new authorized keys and get authenticated again if something goes wrong.

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. From the list that opens, select the yq-sa service account.
  5. Click Create new key in the top panel and select Create authorized key.
  6. Select the encryption algorithm and click Create.
  7. Click Download file with keys.

Create a secretCreate a secret

To get an authorized key from the notebook, create a secret with the contents of the authorized key file.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Under Project resources, click Secret.
  3. Click Create.
  4. In the Name field, enter a name for the secret: yq_access_key.
  5. In the Value field, paste the full contents of the authorized key file you downloaded.
  6. Click Create.

Create a notebookCreate a notebook

Queries to the Managed Service for PostgreSQL database through Query will be sent from the notebook.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Click Open project in JupyterLab and wait until loading is complete.
  3. In the top panel, click File and select New ⟶ Notebook.
  4. Select a kernel and click Select.

Get started in QueryGet started in Query

The yandex_query_magic package provides magic commands for working in Jupyter. Install it to send queries to Query. Paste the code into the yq-storage.ipynb notebook cells.

  1. Open the DataSphere project:

    1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

    2. Click Open project in JupyterLab and wait for the loading to complete.
    3. Open the notebook tab.
  2. Install the yandex_query_magic package:

    %pip install yandex_query_magic
    
  3. Once the installation is complete, from the top panel, select Kernel ⟶ Restart kernel....

  4. Upload the extension:

    %load_ext yandex_query_magic
    
  5. Configure the connection by specifying the data-folder ID and the name of the authorized key secret:

    %yq_settings --folder-id <folder_ID> --env-auth yq_access_key
    
  6. Run a test query to Query:

    %yq select "Hello, world!"
    

Connect to the Object Storage dataConnect to the Object Storage data

For your operations with the Object Storage data, you will need a bucket with a table, a Query connection, and a data binding.

Create a data bucketCreate a data bucket

The Object Storage bucket will contain the purchase dates.

Management console
  1. In the management console, navigate to data-folder.

  2. From the list of services, select Object Storage.

  3. At the top right, click Create bucket.

  4. In the ** Name** field, enter a name for the bucket.

  5. In the Object read access, Object listing access, and Read access to settings fields, select Restricted.

  6. Click Create bucket.

  7. Create a file named visits.csv and add a table with purchase dates to it:

    date|persons_id|item_id
    2024-05-14 12:12:00|1|2
    2024-05-15 13:13:00|2|1
    
  8. Navigate to the bucket you created and click Upload.

  9. In the window that opens, select visits.csv and click Open.

  10. Click Upload.

Create a connection to Object StorageCreate a connection to Object Storage

Management console
  1. In the management console, navigate to data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., storage-connection.

  6. Select the Object Storage connection type and specify the Connection type parameters.

  7. In the Bucket auth field, select Private and set these properties:

    • Cloud and Folder: data-folder.
    • Bucket: Select the bucket you created.
    • Service account: yq-sa.
  8. Click Create.

Create a data bindingCreate a data binding

A data binding contains information about file formats and locations in the bucket, as well as a list of data fields and their types.

To create a data binding:

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Bindings.

  4. Click Create.

  5. Under Connection parameters:

    • Type: Select Object Storage.
    • Connection: Select storage-connection.
  6. Under Binding parameters:

    1. Enter a name for the binding, e.g., visits.
    2. In the Path field, specify the path to the table in the bucket: /visits.csv.
  7. Under Format settings:

    • Format: Select csv_with_names.
    • Delimiter {%secondary%}(optional){%secondary%}: |.
  8. Under Columns, add three columns with the following names and data types:

    • date: DATETIME
    • person_id: INT32
    • items_id: INT32
  9. To verify the data you specified is correct, click Preview. Your table should appear below.

  10. Click Create.

To check the connection, get the table data from the notebook cell:

%yq SELECT * FROM visits LIMIT 100;

Connect to the Managed Service for ClickHouse® dataConnect to the Managed Service for ClickHouse® data

For your operations with the Managed Service for ClickHouse® data, you will need a cluster with a table and a Query connection.

Create a Managed Service for ClickHouse® clusterCreate a Managed Service for ClickHouse® cluster

Any running Managed Service for ClickHouse® cluster with the Yandex Query access option enabled is suitable for sending queries.

Management console
  1. In the management console, select data-folder.

  2. Select Managed Service for ClickHouse.

  3. Click Create cluster.

  4. In the Cluster name field, enter the cluster name, e.g., clickhouse.

  5. Under DBMS settings:

    • In the User management via SQL field, select Enabled from the drop-down list.
    • Specify Username and Password.
  6. Under Service settings:

    • Select the yq-sa service account.
    • Enable the Yandex Query access and Access from the management console options.
  7. You can leave the other settings at their defaults.

  8. Click Create cluster.

Create a tableCreate a table

The Managed Service for ClickHouse® table will contain the item names.

Management console
  1. In the management console, open the clickhouse cluster page and navigate to the SQL tab.

  2. Enter Username and Password you specified when creating the cluster.

  3. In the input window on the right, run the following SQL queries one by one:

    CREATE TABLE items(id INT, description VARCHAR) ENGINE=MergeTree ORDER BY id;
    
    INSERT INTO items(id, description) VALUES(1, 'Refrigerator')
    
    INSERT INTO items(id, description) VALUES(2, 'TV')
    
  4. Click Execute.

Create a connection to Managed Service for ClickHouse®Create a connection to Managed Service for ClickHouse®

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., clickhouse.

  6. Select the Managed Service for ClickHouse connection type.

  7. Under Connection type parameters:

    • Cluster: Select the clickhouse cluster you created earlier.
    • Service account: Select the yq-sa service account.
    • Enter Login and Password you specified when creating the cluster.
  8. Click Create.

To check the connection, run a query in the notebook cell. Here is an example:

%yq SELECT * FROM clickhouse.items

Connect to the Managed Service for PostgreSQL dataConnect to the Managed Service for PostgreSQL data

For your operations with the Managed Service for PostgreSQL data, you will need a cluster with a table and a Query connection.

Create a Managed Service for PostgreSQL clusterCreate a Managed Service for PostgreSQL cluster

Any running Managed Service for PostgreSQL cluster with the Yandex Query access option enabled is suitable for sending queries.

Management console
  1. In the management console, select data-folder.
  2. Select Managed Service for PostgreSQL.
  3. Click Create cluster.
  4. In the Cluster name field, enter the cluster name, e.g., postgresql.
  5. Under Database:
    • Specify DB name, e.g., db1.
    • Specify Username and Password.
  6. Under Service settings, enable Yandex Query access and Access from the management console.
  7. You can leave the other settings at their defaults.
  8. Click Create cluster.

Create a tableCreate a table

The Managed Service for PostgreSQL table will contain the names of the customers.

Management console
  1. In the management console, open the postgresql cluster page and navigate to the SQL tab.

  2. Enter Username and Password you specified when creating the cluster.

  3. In the input window on the right, run the following SQL queries one by one:

    CREATE TABLE persons(person_id INT, name VARCHAR);
    
    INSERT INTO persons(person_id, name) values(1, 'Ivan')
    
    INSERT INTO persons(person_id, name) values(2, 'Peter')
    

Create a connection to Managed Service for PostgreSQLCreate a connection to Managed Service for PostgreSQL

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., postgresql.

  6. Select the Managed Service for PostgreSQL connection type.

  7. Under Connection type parameters:

    • Cluster: Select the postgresql cluster you created earlier.
    • Service account: yq-sa.
    • Database: db1.
    • Enter Login and Password you specified when creating the cluster.
  8. Click Create.

To check the connection, run a query in the notebook cell. Here is an example:

%yq SELECT * FROM postgresql.persons

Run a federated queryRun a federated query

Operations with federated data sources (cross-service analytics) are no different than those with regular data sources. You can simultaneously access external data sources, such as tables in the DB and data in Object Storage, from a query and perform any allowed YQL operations on them.

To combine and get data from all the three tables, run a federated query in a notebook cell:

%%yq SELECT
    v.person_id,
    date,
    p.name,
    c.description
FROM visits AS v
    INNER JOIN postgresql.persons AS p ON
        v.person_id = p.person_id
    INNER JOIN clickhouse.items AS c ON
        v.item_id = c.id

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  • Delete the bucket.
  • Delete the Managed Service for ClickHouse® cluster.
  • Delete the Managed Service for PostgreSQL cluster.
  • Delete the project.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Operations with data using Yandex Query
Next
Recognizing text in image archives using Vision OCR
© 2025 Direct Cursus Technology L.L.C.