Terms and definitions in Query
Connection
A connection is a set of parameters required for connecting Yandex Query to a data source and target. For example, if a file from Yandex Object Storage is used as a data source, a connection contains the name of a bucket and its authorization parameters.
Yandex Query supports the following connection types:
- Object Storage, which is a connection to a Yandex Object Storage bucket; it can be both a data source and a target.
- Managed Service for PostgreSQL, which is a connection to Managed Service for PostgreSQL; it can only be a data source.
- Managed Service for ClickHouse®, which is a connection to Managed Service for ClickHouse®; it can only be a data source.
- Data Streams, which is a connection to a Yandex Managed Service for YDB database where a Yandex Data Streams stream is located; it can be both a data source and a target.
- Monitoring, which is a connection to Yandex Monitoring; it can only be a data target.
Binding
The same YQL query can be run on the data available through different types of connections, such as bucket or stream data. In this case, for each connection, it may be handy to create a binding, which is a resource containing information about a connection, data format, and data schema.
You can only create data bindings for file-based data sources, i.e., to Object Storage.
Query
A query is an expression that is written in YQL
Using Yandex Query queries, you can perform batch and streaming data processing.
Information about query runs
You can run the same query multiple times. The following information is saved for each query run:
- Query execution status.
- Query execution start date and time.
- Query execution duration.
- Name of the user who ran the query.
- Query execution metrics.
The results of the last query run are saved and stored for 24 hours.
Data schema
Data schema is a list of source data fields and types that have no explicit schema, such as Object Storage buckets or Data Streams streams. A schema should describe all fields to use in a query. If a query accesses data using a connection, specify the schema in the request body. If you use a binding, specify the data schema in its properties.
When working with the Managed Service for ClickHouse® or Managed Service for PostgreSQL sources, you cannot specify the schema explicitly since it is automatically extracted from the DBMS's.
Checkpoint
Streaming analysis systems handle infinite data streams that do not have any beginning or end. To avoid processing all data in a stream from the beginning every time, Yandex Query remembers offsets in processed data when a query is rerun. If the processing is paused and then restarted, Yandex Query rewinds the data stream to the saved offset and resumes processing data from that point.
Checkpoints contain information about a streaming query, including offsets in data streams.
If you add instructions to access new streaming sources of data to the text of a query, checkpoints will not contain information about offsets within data streams. As a result, some data may be read from existing streams starting from the last checkpoint, while other data may be read once new messages appear in new data streams.
Note
Query execution method settings (processing data starting from a checkpoint or processing anew) are specified when running a query.
ClickHouse® is a registered trademark of ClickHouse, Inc