Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Query
    • Overview
    • Terms and definitions
    • Quotas and limits
    • Query processing
      • Description
      • Federated queries
      • Data partitioning
      • Partition projection
    • Unified analysis of streaming and analytical data
    • Backups
  • Access management
  • Pricing policy
  • Integration
  • Audit Trails events
  • FAQ

In this article:

  • Syntax
  • Supported data types
  • Supported storage path formats
  • Use cases
  1. Concepts
  2. Batch processing
  3. Data partitioning

Data partitioning

Written by
Yandex Cloud
Updated at March 7, 2025
  • Syntax
  • Supported data types
  • Supported storage path formats
  • Use cases

Yandex Object Storage allows you to store very large amounts of data. Queries may only affect some of this data. If you describe the markup rules for the structure of your data storage in Yandex Query, then you do not even need to read data that is not required for a query from Yandex Object Storage. This makes query execution much faster without affecting the results.

For example, data is stored in the following structure of folders:

year=2021
    month=01
    month=02
    month=03
year=2022
    month=01

In the query below, only the data for February 2021 should be processed, the rest of the data not being required.

SELECT 
    * 
FROM 
    objectstorage.'/' 
    WITH
    (
        Schema =
        (
            data String, 
            year Int, 
            month Int
        )
    ) 
WHERE 
    year=2021 
    AND month=02

If no data partitioning scheme is specified, all stored data is read from Yandex Object Storage. However, as a result of processing, data for any other period is discarded.

If you explicitly specify the storage structure indicating that data in Yandex Object Storage is stored in folders by year and month

SELECT 
    * 
FROM 
    objectstorage.'/' 
    WITH
    (
        Schema =
        (
            data String, 
            year Int, 
            month Int
        ),
        partitioned_by = 
        (
            year,
            month
        )
    ) 
WHERE 
    year=2021 
    AND month=02

when executing a query from Yandex Object Storage, instead of reading all the data, only data for February 2021 is read. This will greatly reduce the amount of data to handle and speed up data processing while the results of both queries will be identical.

Note

In the example above, operations with data are performed at the level of connections. This example is given for illustrative purposes only. We strongly recommend that you work with data using bindings and do not work with connections directly.

SyntaxSyntax

When working at the connection level, partitioning is set using the partitioned_by parameter.

SELECT 
    * 
FROM 
    <connection>.<path> 
WITH 
(
    schema=(<field_1>, <field_2>, <field_3>), 
    partitioned_by=(<field_2>, <field_3>)
)

The partitioned_by parameter lists data schema columns used to partition data stored in Yandex Object Storage. The order of specifying fields in the partitioned_by parameter determines the nesting of Yandex Object Storage folders.

For example, partitioned_by=(year, month) defines the folder structure

year=2021
    month=01
    month=02
    month=03
year=2022
    month=01

And partitioned_by=(month, year) defines a different folder structure

month=01
    year=2021
    year=2022
month=02
    year=2021
month=03
    year=2021

Supported data typesSupported data types

Partitioning is only possible by the following set of YQL data types:

  • Uint16, Uint32, Uint64
  • Int16, Int32, Int64
  • String
  • Bool

If other data types are used for partitioning, an error is returned.

Supported storage path formatsSupported storage path formats

The storage path format where each folder's name explicitly specifies a column name is called the Hive-Metastore format or simply the Hive format.

This format looks as follows:

month=01
    year=2021
    year=2022
month=02
    year=2021
month=03
    year=2021

Warning

The basic partitioning mode in Yandex Query only supports the Hive format.

To specify arbitrary storage paths, use Partition projection.

Use casesUse cases

  • Working with data in Yandex Object Storage

Was the article helpful?

Previous
Federated queries
Next
Partition projection
© 2025 Direct Cursus Technology L.L.C.