Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Object Storage
    • All tutorials
    • Getting statistics on object queries with S3 Select
    • Getting website traffic statistics with S3 Select
    • Getting statistics on object queries using Yandex Query
    • Generating a resource-by-resource cost breakdown report using S3 Select
    • Server-side encryption
    • Integrating an L7 load balancer with CDN and Object Storage
    • Blue-green and canary deployment of service versions
    • Analyzing logs in DataLens
    • Mounting buckets to the file system of Yandex Data Processing hosts
    • Using Object Storage in Yandex Data Processing
    • Importing data from Object Storage, processing and exporting to Managed Service for ClickHouse®
    • Mounting a bucket as a disk in Windows
    • Migrating data from Yandex Data Streams using Yandex Data Transfer
    • Using hybrid storage in Yandex Managed Service for ClickHouse®
    • Loading data from Yandex Managed Service for OpenSearch to Yandex Object Storage using Yandex Data Transfer
    • Automatically copying objects from one bucket to another
    • Recognizing audio files in a bucket on a regular basis
    • Training a model in Yandex DataSphere on data from Object Storage
    • Connecting to Object Storage from VPC
    • Migrating data to Yandex Managed Service for PostgreSQL using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for Greenplum® using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for ClickHouse® using Yandex Data Transfer
    • Uploading data to Yandex Managed Service for YDB using Yandex Data Transfer
    • Exchanging data between Yandex Managed Service for ClickHouse® and Yandex Data Processing
    • Uploading data from Yandex Managed Service for YDB using Yandex Data Transfer
    • Hosting a static Gatsby website in Object Storage
    • Migrating a database from Managed Service for PostgreSQL to Object Storage
    • Exchanging data between Yandex Managed Service for ClickHouse® and Yandex Data Processing
    • Importing data from Yandex Managed Service for PostgreSQL to Yandex Data Processing using Sqoop
    • Importing data from Yandex Managed Service for MySQL® to Yandex Data Processing using Sqoop
    • Migrating data from Yandex Object Storage to Yandex Managed Service for MySQL® using Yandex Data Transfer
    • Migrating a database from Yandex Managed Service for MySQL® to Yandex Object Storage
    • Exporting Greenplum® data to a cold storage in Yandex Object Storage
    • Loading data from Yandex Direct to a Yandex Managed Service for ClickHouse® data mart using Yandex Cloud Functions, Yandex Object Storage, and Yandex Data Transfer
    • Migrating data from Elasticsearch to Yandex Managed Service for OpenSearch
    • Uploading Terraform states to Object Storage
    • Locking Terraform states using Managed Service for YDB
    • Visualizing Yandex Query data
    • Publishing game updates
    • VM backups using Hystax Acura
    • Backing up to Object Storage with CloudBerry Desktop Backup
    • Backing up to Object Storage with Duplicati
    • Backing up to Object Storage with Bacula
    • Backing up to Yandex Object Storage with Veeam Backup
    • Backing up to Object Storage with Veritas Backup Exec
    • Managed Service for Kubernetes cluster backups in Object Storage
    • Developing a custom integration in API Gateway
    • URL shortener
    • Storing application runtime logs
    • Developing a skill for Alice and a website with authorization
    • Creating an interactive serverless application using WebSocket
    • Deploying a web application using the Java Servlet API
    • Developing a Telegram bot
    • Replicating logs to Object Storage using Fluent Bit
    • Replicating logs to Object Storage using Data Streams
    • Uploading audit logs to ArcSight SIEM
    • Exporting audit logs to SIEM Splunk systems
    • Creating an MLFlow server for logging experiments and artifacts
    • Operations with data using Yandex Query
    • Federated data queries using Query
    • Recognizing text in image archives using Vision OCR
    • Converting a video to a GIF in Python
    • Automating tasks using Managed Service for Apache Airflow™
    • Processing files with usage details in Yandex Cloud Billing
    • Deploying a web app with JWT authorization in API Gateway and authentication in Firebase
    • Searching for Yandex Cloud events in Yandex Query
    • Searching for Yandex Cloud events in Object Storage
    • Creating an external table from a bucket table using a configuration file
    • Migrating databases from Google BigQuery to Managed Service for ClickHouse®
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Bucket logs
  • Release notes
  • FAQ

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Create a folder
  • Create a service account for the DataSphere project
  • Add the service account to a project
  • Create an authorized key for the service account
  • Create a secret
  • Create a notebook
  • Get started in Query
  • Connect to Object Storage data
  • Create a data binding
  • Configure partitioning in Object Storage
  • Hive partitioning
  • Partition projection
  • How to delete the resources you created
  1. Tutorials
  2. Operations with data using Yandex Query

Operations with data using Yandex Query

Written by
Yandex Cloud
Updated at May 7, 2025
  • Getting started
    • Required paid resources
  • Set up your infrastructure
    • Create a folder
    • Create a service account for the DataSphere project
    • Add the service account to a project
    • Create an authorized key for the service account
    • Create a secret
    • Create a notebook
  • Get started in Query
  • Connect to Object Storage data
    • Create a data binding
  • Configure partitioning in Object Storage
    • Hive partitioning
    • Partition projection
  • How to delete the resources you created

Yandex Query is an interactive service for serverless data analysis. You can use it to process information from various storages without having to to create a dedicated cluster. The service supports Yandex Object Storage, Yandex Managed Service for PostgreSQL, and Yandex Managed Service for ClickHouse® data storages.

In this tutorial, you will connect to an Object Storage data source and run queries against it from the JupyterLab notebook using Query.

  1. Set up your infrastructure.
  2. Get started in Query.
  3. Connect to Object Storage data.
  4. Configure data partitioning in Object Storage.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Required paid resourcesRequired paid resources

The cost of infrastructure support for working with Object Storage data includes:

  • Fee for DataSphere computing resource usage.
  • Fee for storing data in a bucket.
  • Fee for the amount of read data when executing queries using Query.

Set up your infrastructureSet up your infrastructure

Log in to the Yandex Cloud management console and select the organization you use to access DataSphere. On the Yandex Cloud Billing page, make sure you have a billing account linked.

If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page.

Note

If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.

Create a folderCreate a folder

Management console
  1. In the management console, select a cloud and click Create folder.
  2. Give your folder a name, e.g., data-folder.
  3. Click Create.

Create a service account for the DataSphere projectCreate a service account for the DataSphere project

Management console
  1. Navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Enter a name for the service account, e.g., yq-sa.
  5. Click Add role and assign the following roles to the service account:
    • yq.editor: To run queries using Query.
    • storage.viewer: To view the contents of the Object Storage bucket and objects.
  6. Click Create.

Add the service account to a projectAdd the service account to a project

To enable the service account to run a DataSphere project, add it to the list of project members.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. In the Members tab, click Add member.
  3. Select the yq-sa account and click Add.
  4. Change your service account role to Editor.

Create an authorized key for the service accountCreate an authorized key for the service account

To allow the service account to send Query queries, create an authorized key.

Note

Authorized keys do not expire, but you can always get new authorized keys and get authenticated again if something goes wrong.

Management console
  1. In the management console, navigate to data-folder.
  2. From the list of services, select Identity and Access Management.
  3. In the left-hand panel, select Service accounts.
  4. In the list that opens, select the yq-sa service account.
  5. Click Create new key in the top panel and select Create authorized key.
  6. Select the encryption algorithm and click Create.
  7. Click Download file with keys.

Create a secretCreate a secret

To get an authorized key from the notebook, create a secret with the contents of the authorized key file.

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Under Project resources, click Secret.
  3. Click Create.
  4. In the Name field, enter the name for the secret: yq_access_key.
  5. In the Value field, paste the full contents of the authorized key file you downloaded.
  6. Click Create.

Create a notebookCreate a notebook

  1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

  2. Click Open project in JupyterLab and wait for the loading to complete.
  3. In the top panel, click File and select New ⟶ Notebook.
  4. Select a kernel and click Select.

Get started in QueryGet started in Query

The yandex_query_magic package provides magic commands for working in Jupyter. Install it to send queries to Query. Paste the code into the yq-storage.ipynb notebook cells.

  1. Open the DataSphere project:

    1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

    2. Click Open project in JupyterLab and wait for the loading to complete.
    3. Open the notebook tab.
  2. Install the yandex_query_magic package:

    %pip install yandex_query_magic
    
  3. Once the installation is complete, from the top panel, select Kernel ⟶ Restart kernel....

  4. Upload the extension:

    %load_ext yandex_query_magic
    
  5. Configure the connection by specifying the data-folder ID and the name of the authorized key secret:

    %yq_settings --folder-id <folder_ID> --env-auth yq_access_key
    
  6. Run a test query to Query:

    %yq select "Hello, world!"
    

Connect to Object Storage dataConnect to Object Storage data

As an example, let's use data on New York Yellow Taxi rides. The data was pre-uploaded to a public Object Storage bucket named yq-sample-data, in the nyc_taxi_csv folder.

Note

Yandex Cloud provides the New York City taxi trips dataset as is. Yandex Cloud makes no representations, express or implied, warranties, or conditions pertaining to your use of the specified dataset. To the extent allowed by your local laws, Yandex Cloud shall not be liable for any loss or damage, including direct, consequential, special, indirect, incidental, or exemplary, resulting from your use of the dataset.

NYC Taxi and Limousine Commission (TLC):

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data is not generated by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.

Take a look at the Dataset source and its Use Policy.

Create a Query connection:

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Connections.

  4. Click Create new.

  5. Enter a name for the connection, e.g., storage-connection.

  6. Select the Object Storage connection type and specify the Connection type parameters:

    • In the Bucket auth field, select Public and specify the bucket name.
    • Name the bucket: yq-sample-data.
  7. Click Create.

Create a data bindingCreate a data binding

A data binding contains information about file formats and locations in the bucket as well as about the list of data fields and their types. Query can process data in CSV, TSV, JSON, and Parquet formats. For data in CSV, TSV, and Json formats, you can additionally use an external archiver: gzip, zstd, and others. For data in Parquet format, you can use Snappy, LZ4, ZSTD, and other compression types.

To create a data binding:

Management console
  1. In the management console, select data-folder.

  2. From the list of services, select Yandex Query.

  3. In the left-hand panel, select Bindings.

  4. Click Create.

  5. Under Connection parameters:

    • Type: Select Object Storage.
    • Connection: Select storage-connection.
  6. Under Binding parameters:

    1. Enter a name for the binding, e.g., yq_binding.
    2. In the Path field, specify the path to the data: nyc_taxi_sample/data/.
    3. Select the compression type: gzip.
  7. Under Format settings, select csv_with_names in the Format field.

  8. Under Columns, add two columns with these names and data types:

    • tpep_pickup_datetime: DATETIME
    • trip_distance: DOUBLE
  9. To verify the data you specified is correct, click Preview. A table should appear below.

  10. Click Create.

To check the connection, get the table data from the notebook cell:

%yq SELECT * FROM yq_binding LIMIT 100;

Configure partitioning in Object StorageConfigure partitioning in Object Storage

In Query, you can significantly reduce query time by setting up rules for partitioning (placing) data in Object Storage to only read the required data. For example, if files in a bucket are sorted in folders by year, Query will only look for data in the specified folders when queried.

Query supports two options for specifying data placement rules: Hive partitioning and Partition projection.

Hive partitioningHive partitioning

In Apache Hive™, data is placed in a folder structure in this format: key=value/key2=value2/.... This format is used in systems originally built around the Apache Hadoop™ ecosystem, such as Apache Spark™.

For example, data from the Yellow Taxi dataset is placed in folders according to the Hive partitioning rules:

  • year=2021/month=1/...
  • year=2021/month=2/...
  • ...
  • year=2022/month=1/...
  • year=2022/month=2/...

Configure Hive partitioning:

  1. Clone the yq_binding data binding:

    Management console
    1. In the management console, select data-folder.

    2. From the list of services, select Yandex Query.

    3. In the left-hand panel, click Bindings.

    4. In the row with yq_binding, click and select Clone.

    5. Enter a name for the binding: yq_tutorial_hive_partitioned.

    6. Under Partitioned columns:

      1. Select Basic.
      2. Add a column named year with the UINT32 data type.
      3. Add a column named month with the UINT32 data type.
    7. Click Modify.

  2. To check the result, run this query:

    %yq SELECT * FROM yq_tutorial_hive_partitioned WHERE year=2021
    

Partition projectionPartition projection

Sometimes data in Object Storage is placed according to unique rules. In such cases, you can use partition projection to specify your own data path template.

When using partition projection, you need to describe each component of the year and month path separately, specifying the ranges of their values. Then you need to specify the path template in the Object Storage bucket that stores the data.

Configure partition projection:

  1. Clone the yq_binding data binding:

    Management console
    1. In the management console, select data-folder.

    2. From the list of services, select Yandex Query.

    3. In the left-hand panel, click Bindings.

    4. In the row with yq_binding, click and select Clone.

    5. Enter a name for the binding: yq_tutorial_partition_projection.

    6. Under Partitioned columns:

      1. Select Extended.
      2. Add a column named year with the UINT32 data type:
        • Format type: integer
        • Interval:
          • Min: 2019
          • Max: 2024
          • Step: 1
        • Fixed width: 2
      3. Add a column named month with the UINT32 data type:
        • Format type: integer
        • Interval:
          • Min: 1
          • Max: 12
          • Step: 1
        • Fixed width: 2
    7. Click Modify.

  2. To check the result, run this query:

    %yq SELECT * FROM yq_tutorial_partition_projection WHERE year=2021
    

How to delete the resources you createdHow to delete the resources you created

To stop paying for the resources you created:

  • Delete the bucket.
  • Delete the project.

Was the article helpful?

Previous
Creating an MLFlow server for logging experiments and artifacts
Next
Federated data queries using Query
© 2025 Direct Cursus Technology L.L.C.