Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Query
    • Overview
    • Batch processing
    • Streaming processing
    • Unified analysis of streaming and analytical data
  • Access management
  • Pricing policy
  • Integrations
  • Audit Trails events
  • FAQ

In this article:

  • Make the necessary preparations
  • Analyze the data Object Storage
  • Connect to the analytical data source
  • Run the query
  • Check the result
  • Analyze the streaming data from Data Streams
  • Create a data stream
  • Set up data generation
  • Run the query
  • Check the result
  • See also
  1. Getting started
  2. Unified analysis of streaming and analytical data

Unified streaming and batch data analysis

Written by
Yandex Cloud
Updated at July 1, 2026
View in Markdown
  • Make the necessary preparations
  • Analyze the data Object Storage
    • Connect to the analytical data source
    • Run the query
    • Check the result
  • Analyze the streaming data from Data Streams
    • Create a data stream
    • Set up data generation
    • Run the query
    • Check the result
  • See also

In this example, we will query analytical and streaming data to calculate taxi fares in specific locations.

We will use the same SQL query for both data types, with the only difference in bucket and stream connections and data bindings.

The data for batch processing has been pre-loaded in the Yandex Object Storage bucket in Parquet files. Streaming data will be written to a dedicated Yandex Data Streams stream by a generator.

In both cases, we will use a reference table stored in Object Storage to filter query data.

To run this example:

  1. Make the necessary preparations.
  2. Analyze the Object Storage data.
  3. Analyze the streaming data from Data Streams.

Note

Yandex Cloud provides the New York City taxi trips dataset as is. Yandex Cloud makes no express or implied representations, warranties, or conditions pertaining to your use of the specified dataset. To the extent permitted by your local law, Yandex Cloud shall not be liable for any loss or damage, including direct, indirect, consequential, special, incidental, or punitive, resulting from your use of the dataset.

NYC Taxi and Limousine Commission (TLC):

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data was not created by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.

Please review the dataset’s original source and its terms of use.

Make the necessary preparationsMake the necessary preparations

  1. Log in to the management console or sign up if you have not already. If you have not signed up yet, navigate to the management console and follow the instructions.
  2. On the Yandex Cloud Billing page, make sure you have an ACTIVE or TRIAL_ACTIVE billing account. If you do not have a billing account yet, create one.
  3. If you do not have a folder yet, create one.
  4. We will connect to the data stream using a service account. Create a service account named datastream-connection-account and assign it the ydb.editor role.
  5. Data streams use Yandex Managed Service for YDB. You will need to create a serverless database.

Analyze the data Object StorageAnalyze the data Object Storage

Connect to the analytical data sourceConnect to the analytical data source

  1. In the management console, select the folder where you want to create a connection.

  2. Navigate to Yandex Query.

  3. In the left-hand panel, select Tutorial.

  4. Under Create infrastructure for tutorial, click Create connection.

    This will open the create connection page. Check the default settings; do not change them.

  5. Click Create.

    This will open the create data binding page. Check the default settings; do not change them.

  6. Click Create.

Run the queryRun the query

  1. In the query editor within the Query interface, click New analytics query.

  2. Enter the query text in the text field:

    $data =
    SELECT
        *
    FROM
        `tutorial-analytics`;
    
    $locations =
    SELECT
        PULocationID
    FROM
        `tutorial-analytics`.`nyc_taxi_sample/example_locations.csv`
    WITH
    (
        format=csv_with_names,
        SCHEMA
        (
            PULocationID String
        )
    );
    
    $time =
    SELECT
        HOP_END() AS time,
        rides.PULocationID AS PULocationID,
        SUM(total_amount) AS total_amount
    FROM $data AS rides
    INNER JOIN $locations AS locations
        ON rides.PULocationID=locations.PULocationID
    GROUP BY
        HOP(CAST(tpep_pickup_datetime AS Timestamp?), "PT1M", "PT1M", "PT1M"),
        rides.PULocationID;
    
    SELECT
        *
    FROM
        $time;
    
  3. Click Run.

Check the resultCheck the result

Once executed, the analytical query will return the distribution of taxi trip fares in specific locations.

# time PULocationID total_amount
1 2017-12-31T22:24:00.000000Z 120 7.54
2 2018-01-01T00:13:00.000000Z 120 48.8
3 2018-01-01T03:25:00.000000Z 120 30.8
4 2018-01-01T11:29:00.000000Z 120 32.88
5 2018-01-01T15:13:00.000000Z 120 9.8
6 2018-01-01T22:03:00.000000Z 120 14.8
7 2018-01-02T19:28:00.000000Z 120 7.3
8 2018-01-03T10:17:00.000000Z 120 81.3

Analyze the streaming data from Data StreamsAnalyze the streaming data from Data Streams

Create a data streamCreate a data stream

  1. In the management console, select the folder where you need to create a data stream.
  2. Navigate to Data Streams.
  3. Click Create stream.
  4. Specify the Yandex Managed Service for YDB database created earlier.
  5. Name the data stream: yellow-taxi.
  6. Click Create.

Set up data generationSet up data generation

  1. Create a connection:

    1. In the management console, select the folder where you want to create a connection.
    2. Navigate to Yandex Query.
    3. In the left-hand panel, select Tutorial.
    4. Navigate to Streaming.
    5. Under Create infrastructure for tutorial, click Create connection.
    6. In the window that opens, under Connection type parameters, select the database and service account you created earlier.
    7. Click Create.
  2. Create a data binding:

    1. This will open the create data binding page.
    2. Under Binding parameters, select the yellow-taxi stream you created earlier.
    3. Click Create.

The generator will start writing data to the yellow-taxi stream. You can control the generator using the Stop and Start buttons.

Run the queryRun the query

  1. In the query editor within the Query interface, click New streaming query.

  2. Enter the query text in the text field:

    $data =
    SELECT
        *
    FROM bindings.`tutorial-streaming`;
    
    $locations =
    SELECT
        PULocationID
    FROM
        `tutorial-analytics`.`nyc_taxi_sample/example_locations.csv`
    WITH
    (
        format=csv_with_names,
        SCHEMA
        (
            PULocationID String
        )
    );
    
    $time =
    SELECT
        HOP_END() AS time,
        rides.PULocationID AS PULocationID,
        SUM(total_amount) AS total_amount
    FROM $data AS rides
    INNER JOIN $locations AS locations
        ON rides.PULocationID=locations.PULocationID
    GROUP BY
        HOP(cast(tpep_pickup_datetime AS Timestamp?), "PT1M", "PT1M", "PT1M"),
        rides.PULocationID;
    
    SELECT
        *
    FROM
        $time;
    
  3. Click Run.

Check the resultCheck the result

Once launched, the query returns the total fare (total_amount) of the taxi rides taken in specific locations (PULocationID) after processing started.

# PULocationID time total_amount
1 125 2022-02-15T12:03:00.000000Z 1275.4084
2 129 2022-02-15T12:03:00.000000Z 1073.0449
3 126 2022-02-15T12:03:00.000000Z 202.85883
4 121 2022-02-15T12:03:00.000000Z 636.8784
5 124 2022-02-15T12:03:00.000000Z 923.87805
6 127 2022-02-15T12:04:00.000000Z 2105.3125
...

See alsoSee also

  • HOP operator and window parameters in streaming data processing
  • Aggregate functions in YQL
  • SQL syntax
  • Batch processing
  • Streaming data analysis

Was the article helpful?

Previous
Streaming processing
Next
Overview
© 2026 Direct Cursus Technology L.L.C.