Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Query
    • All tutorials
    • Processing Cloud Logging logs
    • Processing CDC Debezium streams
    • Visualizing Object Storage data in DataLens
    • Processing Audit Trails events
    • Processing files with usage details in Yandex Cloud Billing
    • Analyzing data with Jupyter
    • Automating tasks using Managed Service for Apache Airflow™
    • Analyzing data with Query
    • Working with data in Object Storage
    • Working with data in Managed Service for ClickHouse®
    • Working with data in Managed Service for PostgreSQL
    • Federated data queries
    • Searching for Yandex Cloud events in Query
  • Access management
  • Pricing policy
  • Integration
  • Audit Trails events
  • FAQ

In this article:

  • Getting started
  • Required paid resources
  • Install and configure the yandex_query_magic package
  • Test the package
  • Creating query templates using the mustache syntax
  • Jinja2
  • Built-in mustache templates
  • Jinja templates
  • Capture command results
  1. Tutorials
  2. Analyzing data with Query

Analyzing data with Query

Written by
Yandex Cloud
Updated at March 4, 2025
  • Getting started
    • Required paid resources
  • Install and configure the yandex_query_magic package
    • Test the package
  • Creating query templates using the mustache syntax
    • Jinja2
    • Built-in mustache templates
    • Jinja templates
  • Capture command results

Yandex Query is an interactive service for serverless data analysis. It allows you to process data from various storages using SQL queries without creating a dedicated data processing cluster. Yandex Query works with Yandex Object Storage, Managed Service for PostgreSQL, and Managed Service for ClickHouse®.

To analyze DataSphere data using Query:

  1. Install and configure the yandex_query_magic package.
  2. Create a query template.
  3. Process the execution results.

Getting started

Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.

  1. On the DataSphere home page, click Try for free and select an account to log in with: Yandex ID or your working account with the identity federation (SSO).
  2. Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
  3. Create a community.
  4. Link your billing account to the DataSphere community you are going to work in. Make sure you have a linked billing account and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one in the DataSphere interface.

Configure the infrastructure to work with Yandex Query:

  1. Go to the management console and link the billing account to the cloud.
  2. Create a folder where Yandex Query will run.

Required paid resources

The cost of analyzing data using Yandex Query includes:

  • Fee for DataSphere computing resource usage.
  • Fee for data read by Yandex Query when executing queries.

Install and configure the yandex_query_magic package

  1. Open the DataSphere project:

    1. Select the relevant project in your community or on the DataSphere homepage in the Recent projects tab.

    2. Click Open project in JupyterLab and wait for the loading to complete.
    3. Open the notebook tab.
  2. Install the yandex_query_magic package by running this command in the notebook cell:

%pip install yandex_query_magic --upgrade
  1. Configure the yandex_query_magic package. Do it by specifying the parameters using the yq_settings line command:
%yq_settings --folder-id <folder_ID> ...

Available parameters:

  • --folder-id <folder_id>: ID of the folder used to run Query queries.
  • --env-auth <environment_variable>: Sets authentication with the authorized key kept in the Yandex DataSphere secret. Create a DataSphere secret and specify its name in the --env-auth parameter.

Test the package

You can use the %yq line magic command with a single-line SQL query. In this case, the %yq keyword is used to execute the query.

Run the following commands in the notebook:

%load_ext yandex_query_magic
%yq_settings --env-auth <Yandex DataSphere_secret_name> --folder-id <folder_ID>
%yq SELECT "Hello, world!"

Where:

  • %yq: "Magic" command name.
  • SELECT "Hello, world!": Text of query to Query.

To send multi-line SQL queries, you need to use %%yq cell magic. The query text must begin with the %%yq keyword:

%%yq --folder-id <folder_ID> --name "My query"  --raw-results

SELECT
    col1,
    COUNT(*)
FROM table
GROUP BY col1

Where:

  • --folder-id: ID of the folder used to run Query queries. The default folder is the one specified earlier through %yq_settings. If not specified, it defaults to the folder in which the VM is running.
  • --name: (Optional) Query name.
  • --description: (Optional) Query description.
  • --raw-results: (Optional) Returns raw results of Query query processing. For the format specification, refer to Correspondence of YQL and JSON types.

Creating query templates using the mustache syntax

You can use the templates of the computations exchanged between Jupyter and Query to work with queries or perform standard operations without writing code. For this purpose, Query has built-in support for the mustache syntax for queries, where all keywords and template directives are placed inside the {{}} key symbols. You can use the mustache syntax with Jinja2 or in a built-in mustache interpreter.

The Query built-in mustache templates allow you to insert variables from the Jupyter runtime environment directly into SQL queries. Such variables will also be automatically converted into the required Query data structures. For example:

myQuery = "select * from Departments"
%yq {{myQuery}}

The {{myQuery}} mustache string will be interpreted as the name of the source variable for the text, and select * from Departments will be sent to Query for execution.

Using mustache templates streamlines the integration between Jupyter and Query. Let's assume you have the lst=["Academy", "Physics"] Python list containing the names of departments whose data you want to process. Without the mustache syntax support in Query, first, you would need to convert the Python list into a string and then input it into the SQL query. Query example:

var lstStr = ",".join(lst)
sqlQuery = f'select "Academy" in ListCreate({lstStr});
%yq {{sqlQuery}}

I.e., working with complex data types requires a detailed knowledge of the Query SQL syntax. With the mustache syntax, you can write a much simpler query:

%yq select "Academy" in {{lst}}

Here, lst will be identified as a Python list and automatically get the correct SQL structure for list processing. This is the final query that will be sent to Query:

%yq select "Academy" in ListCreate("Academy", "Physics") as lst

Jinja2

To perform standard tasks in Jupyter and Query, we recommend using the built-in mustache syntax. For more advanced templating, you can use Jinja2.

To install Jinja2, run this command:

%pip install Jinja2

Example of using a Jinja template with the for cycle:

{% for user in users %}
    command = "select * from users where name='{{ user }}'"
{% endfor %}

You can also use Jinja templates to perform various data processing operations. This example illustrates operations performed on department names based on whether the student belongs to the department:

{% if student.department == "Academy" %}
    {{ student.department|upper }}
{% elif upper(student.department) != "MATHS DEPARTMENT" %}
    {{ student.department|capitalize }}
{% endif %}

To make sure Jinja conversions comply with the Query rules, use the special to_yq filter. Here is what the lst=["Academy", "Physics"] Python list from the above example looks like in a Jinja template:

%%yq --jinja2
select "Academy" in {{lst|to_yq}}

If you need to disable templating, use the --no-var-expansion argument:

%%yq --no-var-expansion
...

Built-in mustache templates

In Yandex Query, the built-in mustache templates are enabled by default and help to streamline basic operations with Jupyter variables:

lst=["Academy", "Physics"]
%yq select "Academy" in {{lst}}

Using Pandas DataFrame variables

Here is an example of using yandex_query_magic and the mustache syntax with Pandas DataFrame:

  1. Declare the variable in Jupyter:

    df = pandas.DataFrame({'_float': [1.0],
                        '_int': [1],
                        '_datetime': [pd.Timestamp('20180310')],
                        '_string': ['foo']})
    

You can use df as a variable in queries to Yandex Query. During query execution, the df value is used to create a temporary table also named df. The table can be used within the Yandex Query query that is currently running.

  1. Get the data:

    %%yq
    SELECT
        *
    FROM mytable
    INNER JOIN {{df}}
        ON mytable.id=df._int
    

Table of Pandas types mapping to Query types:

Pandas type YQL type Note
int64 Int64 Exceeding the int64 limit will result in a query execution error.
float64 Double
datetime64[ns] Timestamp Precision to the microsecond. Specifying nanoseconds (in the nanosecond field) will return an exception.
str String

Using Python dict variables

Here is an example of using yandex_query_magic and the mustache syntax with a Python dict:

  1. Declare the variable in Jupyter:

    dct = {"a": "1", "b": "2", "c": "test", "d": "4"}
    

    Now you can use the dct variable directly in Query queries. When you execute a query, dct will be converted into the relevant YQL Dict object:

    Key Value
    a "1"
    b "2"
    c "test"
    d "4"
  2. Get the data:

    %%yq
    SELECT "a" in {{dct}}
    

Table of Python dict types mapping to Query types:

Python type YQL type Note
int Int64 Exceeding the int64 limit will result in a query execution error.
float Double
datetime Timestamp
str String

You can also convert a dictionary into a Pandas DataFrame table using a constructor:

df = pandas.DataFrame(dct)

Using Python list variables

Here is an example of using yandex_query_magic and the mustache syntax with a Python list:

  1. Declare the variable in Jupyter:

    lst = [1,2,3]
    

    Now you can use the lst variable directly in Query queries. When you execute a query, lst will be converted into the relevant YQL List object:

  2. Get the data:

    %%yq
    SELECT 1 IN {{lst}}
    

Table of Python list types mapping to Query types:

Python type YQL type Note
int Int64 Exceeding the int64 limit will result in a query execution error.
float Double
datetime Timestamp
str String

You can also convert a list into a Pandas DataFrame table using a constructor:

df = pandas.DataFrame(lst,
                      columns =['column1', 'column2', 'column3'])

Jinja templates

Jinja templates provide a convenient way to generate SQL queries. They allow you to automatically insert various data, e.g, search conditions, so that you do not need to write each query manually. This streamlines work, prevents errors, and results in more readable code.

With Jinja templates, you can also automate generating queries that contain repeated elements. For example, you can use loops in your template to add a list of values to check in a query. This provides you with additional flexibility and enables you to write complex queries faster when you need to handle large amounts of data.

The steps below explain how to filter data in Yandex Query using a Python variable.

  1. Declare the variable in Jupyter:

    name = "John"
    
  2. When running the following code in the Jupyter cell, make sure to specify the jinja2 flag before executing an SQL query for it to be interpreted as a Jinja2 template:

    %%yq <other_parameters> --jinja2
    
    SELECT "{{name}}"
    

    Parameters:

    • --jinja2: Enables query text rendering with Jinja templates. To use this parameter, you need to install the Jinja2 package (%pip install Jinja2).

to_yq filter

Jinja2 is a general-purpose templating engine. When processing variable values, it uses a standard string representation of data types.

For example, you have a Python list specified as lst=["Academy", "Physics"]. This is how you can use it in a Jinja template:

%%yq --jinja2
select "Academy" in {{lst}}

This will get you the Unexpected token '[' error. Jinja converts the lst variable to an ["Academy", "Physics"] string according to Python rules but disregards the Yandex Query-specific features of SQL queries, which causes the error.

To specify that Jinja conversions must comply with the Yandex Query rules, use the to_yq filter. Then, the same query in the Jinja syntax will look like this:

%%yq --jinja2
select "Academy" in {{lst|to_yq}}

Theto_yq Jinja filter converts data to the Yandex Query syntax in exactly the same manner as built-in mustache templates.

Capture command results

To capture the result of a line magic command, you can use the assignment command:

varname = %yq <query>

To capture the result of a cell magic command, you can specify the variable name and the << operator at the beginning of the query text:

%%yq
varname << <query>

Then, you can use the result as a standard Jupyter variable.

For example, this is how you capture a command result to output variables using cell magic:

output = %yq SELECT 1 as column1

And this is how you can capture a command result to output2 using line magic:

%%yq
output2 << SELECT 'Two' as column2, 3 as column3

Then, you can use these variables as standard IPython variables. For example, you can print them:

output

By default, the%yq and %%yq commands output a Pandas DataFrame object with columns matching the column names from the SQL query and rows containing query results. To disable Pandas DataFrame conversion, you can use the --raw-results argument.

The output variable in the above example will have the following structure:

column1
0 1

The output2 variable will look like this:

column2 column3
0 Two 3

If a query does not imply a result, e.g., insert into table select * from another_table, the None value will be returned. If a query returns multiple results, they will be displayed as a list of individual results.

When executing a query, yandex_query_magic outputs additional data, e.g., query ID, start time, and execution duration:

jupyter_query_info

To hide the execution progress information for a cell, you can use the additional %%capture command.

%%capture
%%yq
<query>

In this case, the execution progress will not be output to the console.

Was the article helpful?

Previous
Automating tasks using Managed Service for Apache Airflow™
Next
Working with data in Object Storage
© 2025 Direct Cursus Technology L.L.C.