Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Query
    • Overview
    • Terms and definitions
    • Quotas and limits
    • Query processing
    • Unified analysis of streaming and analytical data
    • Backups
  • Access management
  • Pricing policy
  • Integration
  • Audit Trails events
  • FAQ

In this article:

  • Batch processing module
  • Streaming processing module
  • Use cases
  1. Concepts
  2. Query processing

Query processing

Written by
Yandex Cloud
Updated at March 7, 2025
  • Batch processing module
  • Streaming processing module
  • Use cases

Yandex Query is a massively parallel system that consists of two modules: batch processing and streaming analysis. Both modules store data in a single query metabase. A query can be analytical or streaming. The query runtime is selected depending on its type.

query-processing

The runtime module divides a query into stages, each performing its own function. The more complicated a query is, the more stages its execution involves.

yq-stages

Batch processing module

Once an analytical query is received, it's split into a number of independent stages distributed across a large number of servers to be executed. Yandex Query selects the number of stages automatically after analyzing the data volume. Any computation during processing is performed in-memory with no data saved to disk.

Currently, the batch processing module can get data from:

  • Yandex Object Storage.

Queries being executed in-memory, there are limitations on the maximum volume of data in queries with aggregated (GROUP BY) or combined (JOIN) data.

Cluster-wide capacity is used to execute analytical queries. This capacity is used while processing the query, and is then released. If multiple analytical queries are run concurrently with large amounts of data being processed, new queries may receive errors due to insufficient resources during their execution. This doesn't happen often, and usually all you have to do is rerun a query so it starts.

Streaming processing module

To perform streaming processing, data is read from a data stream bus and, like in the batch processing module, is split into independent stages distributed across servers. The number of stages is selected based on the analysis of the data stream capacity. Any computation during processing is performed in-memory with no data saved to disk.

Often, only a set of changes from the source system is transferred in data streams. This set of changes might be insufficient for processing a query and making a decision. Therefore, references are used to extend the semantics of processed data. A reference is a statical set of information that lets you enrich streaming data.

Currently, the streaming processing module can get data from:

  • Yandex Data Streams.

References can be stored in:

  • Yandex Object Storage.

To protect the system from overloading, computations are scaled automatically and insufficient memory is handled. To safeguard the system from failures, the current state of computations is saved to an external storage system on a regular basis.

Use cases

  • Processing Yandex Audit Trails events
  • Processing Yandex Cloud Logging logs
  • Processing CDC Debezium streams
  • Processing files with usage details in Yandex Cloud Billing

Was the article helpful?

Previous
Quotas and limits
Next
Description
Yandex project
© 2025 Yandex.Cloud LLC