Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Query
    • Overview
    • Batch processing
    • Streaming processing
    • Unified analysis of streaming and analytical data
  • Access management
  • Pricing policy
  • Integration
  • Audit Trails events
  • FAQ

In this article:

  • Get started
  • Create a data stream
  • Set up data generation
  • Run the query
  • Review the result
  • See also
  1. Getting started
  2. Streaming processing

Processing of data streams from Yandex Data Streams

Written by
Yandex Cloud
Updated at March 6, 2025
  • Get started
  • Create a data stream
  • Set up data generation
  • Run the query
  • Review the result
  • See also

In this example, you will process a data stream on New York City taxi rides. Data for the example will be written by a generator to a dedicated Yandex Data Streams stream.

As a result, you will get the total cost of the first ten rides since the stream data processing began.

To run this example:

  1. Get started.
  2. Create a data stream.
  3. Set up data generation.
  4. Run the query.
  5. Review the result.

Note

Yandex Cloud provides the New York City taxi trips dataset as is. Yandex Cloud makes no representations, express or implied, warranties, or conditions pertaining to your use of the specified dataset. To the extent allowed by your local laws, Yandex Cloud shall not be liable for any loss or damage, including direct, consequential, special, indirect, incidental, or exemplary, resulting from your use of the dataset.

NYC Taxi and Limousine Commission (TLC):

The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data is not generated by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.

Take a look at the dataset source and its use policy.

Get startedGet started

  1. Log in or sign up to the management console. If not signed up yet, navigate to the management console and follow the on-screen instructions.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and its status is ACTIVE or TRIAL_ACTIVE. If you do not have a billing account yet, create one.
  3. If you do not have a folder yet, create one.
  4. We will connect to our data stream using a service account. Create a service account named datastream-connection-account with the ydb.editor role.
  5. Data streams use Yandex Managed Service for YDB. You will need to create a serverless database.

Create a data streamCreate a data stream

  1. In the management console, select the folder where you need to create a data stream.
  2. Select Data Streams.
  3. Click Create stream.
  4. Specify the Yandex Managed Service for YDB database created previously.
  5. Enter the name of the stream: yellow-taxi.
  6. Click Create.

Set up data generationSet up data generation

  1. Create a connection.

    1. In the management console, select the folder where you want to create a connection.
    2. In the list of services, select Yandex Query.
    3. In the left-hand panel, select Tutorial.
    4. Go to Streaming.
    5. Under Create infrastructure for tutorial, click Create connection.
    6. In the window that opens, under Connection type parameters, select the database and service account that you created previously.
    7. Click Create.
  2. Create a data binding:

    1. A page for creating a data binding will open.
    2. Under Binding parameters, select the yellow-taxi stream created previously.
    3. Click Create.

Data generation to the yellow-taxi stream will start. Use the Stop and Start buttons to control the data generator.

Run the queryRun the query

  1. In the query editor in the Query interface, click New streaming query.

  2. Enter the query text in the text field:

    $data = 
    SELECT 
        *
    FROM
        bindings.`tutorial-streaming` LIMIT 10;
    
    SELECT
        HOP_END() AS time,
        COUNT(*) AS ride_count,
        SUM(total_amount) AS total_amount
    FROM 
        $data
    GROUP BY 
        HOP(CAST(tpep_pickup_datetime AS Timestamp), "PT1M", "PT1M", "PT1M");
    
  3. Click Run.

Review the resultReview the result

Once the query is completed, you will see the result with the total cost (total_amount) of the first 10 rides taken after running the query.

# time ride_count total_amount
1 2022-11-28T16:05:00.000000Z 10 5675.542679843059

See alsoSee also

  • HOP. Window parameters in streamed data processing
  • Aggregate functions. YQL syntax
  • SQL expression format
  • Streaming data analysis

Was the article helpful?

Previous
Batch processing
Next
Unified analysis of streaming and analytical data
© 2025 Direct Cursus Technology L.L.C.