Yandex Query overview
Yandex Query is a data service that can run federated queries against Yandex Object Storage object storage, Managed Service for ClickHouse®, Managed Service for Greenplum®, Managed Service for MySQL®, Managed Service for PostgreSQL, and Managed Service for YDB managed databases, and Yandex Data Streams real-time streams. Yandex Query uses YQL – a unified SQL dialect – to aggregate query results across these systems.
Yandex Query is a fully managed cloud service, meaning that you need no running servers with software deployed. All the resources you need for your queries are allocated the moment you run them and vacated as soon as the queries are complete. The queries themselves start running instantly.
Yandex Query allows you to:
- Use the same written query in scenarios for analyzing data stored in Yandex Object Storage and analyzing data in real time.
- Aggregate query execution results across different systems.
- Save on development thanks to using a common query language, YQL, and a common approach.
Yandex Query combines data virtualization features and a real-time streaming data analysis system. This architecture is called Unified Lambda.
The Unified Lambda model uses a unified SQL query text for processing streaming data and data stored in storage systems of different classes.
Support for raw data storage
Companies prefer to store large volumes of rarely accessed data in object storage of the Yandex Object Storage class. Long-term storage of rarely processed data in storage systems like this is most cost-efficient. Data is stored in Yandex Object Storage in unstructured form and this data needs to be processed in a simple and analyst-friendly way.
Streaming data processing
Streaming processing is based on grouping window functions that receive data streams, group them by source and time window, make computations, and send execution results to external systems. A distinctive feature of Yandex Query is a unified text of SQL queries used for both streaming and batch processing.
Integration with external systems
Streaming processing
Streaming queries can get data from the following sources:
- Yandex Data Streams. Application logs, Debezium database CDC streams, or any other information can be used as input data.
Streaming processing results are exported to:
- Monitoring as metrics for creating charts and dashboards or alerting.
- Yandex Data Streams. Using Yandex Data Transfer, data from Yandex Data Streams can be sent to different systems, including various DBMS.
Batch processing
Analytical queries in Yandex Query can get data from Yandex Object Storage in JSON, CSV/TSV, and Parquet formats compressed using different algorithms. You can also run analytical queries against Managed Service for ClickHouse®, Managed Service for Greenplum®, Managed Service for MySQL®, Managed Service for PostgreSQL, and Managed Service for YDB managed databases.
You can use Yandex Query for cross-service data analytics, accessing all supported data sources in a single query.
You can download the query execution results from the Yandex Query user interface. If required, you can also save them to Yandex Object Storage.
Yandex DataLens
With Yandex Query, you can visualize data stored in Yandex Object Storage.
ClickHouse® is a registered trademark of ClickHouse, Inc