Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Query
    • Overview
    • Batch processing
    • Streaming processing
    • Unified analysis of streaming and analytical data
  • Access management
  • Pricing policy
  • Integration
  • Audit Trails events
  • FAQ

In this article:

  • Get started
  • Analyze the data from Object Storage
  • Connect to analytical data
  • Run a query
  • Review the result
  • Analyze the Data Streams streaming data
  • Create a data stream
  • Set up data generation
  • Run the query
  • Review the result
  • See also
  1. Getting started
  2. Unified analysis of streaming and analytical data

Unified analysis of streaming and analytical data

Written by
Yandex Cloud
Updated at March 6, 2025
  • Get started
  • Analyze the data from Object Storage
    • Connect to analytical data
    • Run a query
    • Review the result
  • Analyze the Data Streams streaming data
    • Create a data stream
    • Set up data generation
    • Run the query
    • Review the result
  • See also

In this example, you will calculate the cost of taxi rides in specific locations using a single query against analytical and streaming data.

The text of the SQL query used to process both data types is the same: only the connections and data bindings for the bucket and stream are different.

Data for analytical processing was placed in the Yandex Object Storage bucket, in Parquet files. Stream data will be written by a generator to a dedicated Yandex Data Streams stream.

In both cases, we use a reference stored in Object Storage to filter our query data.

To run this example:

  1. Get ready.
  2. Analyze the data from Object Storage.
  3. Analyze the streaming data from Data Streams.

Note

Yandex Cloud provides the New York City taxi trips dataset as is. Yandex Cloud makes no representations, express or implied, warranties, or conditions pertaining to your use of the specified dataset. To the extent allowed by your local laws, Yandex Cloud shall not be liable for any loss or damage, including direct, consequential, special, indirect, incidental, or exemplary, resulting from your use of the dataset.

NYC Taxi and Limousine Commission (TLC):

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data is not generated by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.

Take a look at the dataset source and its use policy.

Get startedGet started

  1. Log in or sign up to the management console. If not signed up yet, navigate to the management console and follow the on-screen instructions.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one.
  3. If you do not have a folder yet, create one.
  4. We will connect to our data stream using a service account. Create a service account named datastream-connection-account with the ydb.editor role.
  5. Data streams use Yandex Managed Service for YDB. You will need to create a serverless database.

Analyze the data from Object StorageAnalyze the data from Object Storage

Connect to analytical dataConnect to analytical data

  1. In the management console, select the folder where you want to create a connection.

  2. In the list of services, select Yandex Query.

  3. In the left-hand panel, select Tutorial.

  4. Under Create infrastructure for tutorial, click Create connection.

    A new connection creation page will open. View the default parameter values, but do not edit them.

  5. Click Create.

    The data binding page will open. View the default parameter values, but do not edit them.

  6. Click Create.

Run a queryRun a query

  1. In the query editor in the Query interface, click New analytics query.

  2. Enter the query text in the text field:

    $data =
    SELECT
        *
    FROM
        `tutorial-analytics`;
    
    $locations =
    SELECT
        PULocationID
    FROM
        `tutorial-analytics`.`nyc_taxi_sample/example_locations.csv`
    WITH
    (
        format=csv_with_names,
        SCHEMA
        (
            PULocationID String
        )
    );
    
    $time =
    SELECT
        HOP_END() AS time,
        rides.PULocationID AS PULocationID,
        SUM(total_amount) AS total_amount
    FROM $data AS rides
    INNER JOIN $locations AS locations
        ON rides.PULocationID=locations.PULocationID
    GROUP BY
        HOP(CAST(tpep_pickup_datetime AS Timestamp?), "PT1M", "PT1M", "PT1M"),
        rides.PULocationID;
    
    SELECT
        *
    FROM
        $time;
    
  3. Click Run.

Review the resultReview the result

Once the analytical query is complete, you will see the result: distribution of taxi ride costs in specific locations.

# time PULocationID total_amount
1 2017-12-31T22:24:00.000000Z 120 7.54
2 2018-01-01T00:13:00.000000Z 120 48.8
3 2018-01-01T03:25:00.000000Z 120 30.8
4 2018-01-01T11:29:00.000000Z 120 32.88
5 2018-01-01T15:13:00.000000Z 120 9.8
6 2018-01-01T22:03:00.000000Z 120 14.8
7 2018-01-02T19:28:00.000000Z 120 7.3
8 2018-01-03T10:17:00.000000Z 120 81.3

Analyze the Data Streams streaming dataAnalyze the Data Streams streaming data

Create a data streamCreate a data stream

  1. In the management console, select the folder where you need to create a data stream.
  2. Select Data Streams.
  3. Click Create stream.
  4. Specify the Yandex Managed Service for YDB database created previously.
  5. Enter the name of the stream: yellow-taxi.
  6. Click Create.

Set up data generationSet up data generation

  1. Create a connection.

    1. In the management console, select the folder where you want to create a connection.
    2. In the list of services, select Yandex Query.
    3. In the left-hand panel, select Tutorial.
    4. Go to Streaming.
    5. Under Create infrastructure for tutorial, click Create connection.
    6. In the window that opens, under Connection type parameters, select the database and service account that you created previously.
    7. Click Create.
  2. Create a data binding:

    1. A page for creating a data binding will open.
    2. Under Binding parameters, select the yellow-taxi stream created previously.
    3. Click Create.

Data generation to the yellow-taxi stream will start. Use the Stop and Start buttons to control the data generator.

Run the queryRun the query

  1. In the query editor in the Query interface, click New streaming query.

  2. Enter the query text in the text field:

    $data =
    SELECT
        *
    FROM bindings.`tutorial-streaming`;
    
    $locations =
    SELECT
        PULocationID
    FROM
        `tutorial-analytics`.`nyc_taxi_sample/example_locations.csv`
    WITH
    (
        format=csv_with_names,
        SCHEMA
        (
            PULocationID String
        )
    );
    
    $time =
    SELECT
        HOP_END() AS time,
        rides.PULocationID AS PULocationID,
        SUM(total_amount) AS total_amount
    FROM $data AS rides
    INNER JOIN $locations AS locations
        ON rides.PULocationID=locations.PULocationID
    GROUP BY
        HOP(cast(tpep_pickup_datetime AS Timestamp?), "PT1M", "PT1M", "PT1M"),
        rides.PULocationID;
    
    SELECT
        *
    FROM
        $time;
    
  3. Click Run.

Review the resultReview the result

Once you run the query to the streaming data, you will see the result with the total cost of rides (total_amount) in specific PULocationID locations taken after running the query.

# PULocationID time total_amount
1 125 2022-02-15T12:03:00.000000Z 1275.4084
2 129 2022-02-15T12:03:00.000000Z 1073.0449
3 126 2022-02-15T12:03:00.000000Z 202.85883
4 121 2022-02-15T12:03:00.000000Z 636.8784
5 124 2022-02-15T12:03:00.000000Z 923.87805
6 127 2022-02-15T12:04:00.000000Z 2105.3125
...

See alsoSee also

  • HOP. Window parameters in streamed data processing
  • Aggregate functions. YQL syntax
  • SQL expression format
  • Batch processing
  • Streaming data analysis

Was the article helpful?

Previous
Streaming processing
Next
Overview
© 2025 Direct Cursus Technology L.L.C.