Processing of data streams from Yandex Data Streams
In this example, you will process a data stream on New York City taxi rides. Data for the example will be written by a generator to a dedicated Yandex Data Streams stream.
As a result, you will get the total cost of the first ten rides since the stream data processing began.
To run this example:
Note
Yandex Cloud provides the New York City taxi trips dataset as is. Yandex Cloud makes no representations, express or implied, warranties, or conditions pertaining to your use of the specified dataset. To the extent allowed by your local laws, Yandex Cloud shall not be liable for any loss or damage, including direct, consequential, special, indirect, incidental, or exemplary, resulting from your use of the dataset.
NYC Taxi and Limousine Commission (TLC):
The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data is not generated by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.
Take a look at the dataset source
Get started
- Log in or sign up to the management console
. If you are not signed up yet, navigate to the management console and follow the instructions. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not yet have a billing account, create one. - If you do not have a directory yet, create one.
- We will connect to our data stream using a service account. Thus, you will need to create a service account with the
datastream-connection-account
name and theydb.editor
role. - Data streams use Yandex Managed Service for YDB. Create a serverless database.
Create a data stream
- In the management console
, select the folder where you need to create a data stream. - Select Data Streams.
- Click Create stream.
- Specify the Yandex Managed Service for YDB database created previously.
- Enter the name of the stream:
yellow-taxi
. - Click Create.
Set up data generation
-
Create a connection.
- In the management console
, select the folder where you want to create a connection. - In the list of services, select Yandex Query.
- In the left-hand panel, select
Tutorial. - Go to Streaming.
- Under Create infrastructure for tutorial, click Create connection.
- In the window that opens, under Connection type parameters, select the database and service account that you created previously.
- Click Create.
- In the management console
-
Create a data binding:
- A page for creating a data binding will open.
- Under Binding parameters, select the
yellow-taxi
stream created previously. - Click Create.
Data generation to the yellow-taxi
stream will start. Use the Stop and Start buttons to control the data generator.
Run the query
-
In the query editor in the Query interface, click New streaming query.
-
Enter the query text in the text field:
$data = SELECT * FROM bindings.`tutorial-streaming` LIMIT 10; SELECT HOP_END() AS time, COUNT(*) AS ride_count, SUM(total_amount) AS total_amount FROM $data GROUP BY HOP(CAST(tpep_pickup_datetime AS Timestamp), "PT1M", "PT1M", "PT1M");
-
Click Run.
Review the result
Once the query is completed, you'll see the following results: the total cost (total_amount
) of the first 10 rides made after the query ran.
# | time | ride_count | total_amount |
---|---|---|---|
1 | 2022-11-28T16:05:00.000000Z | 10 | 5675.542679843059 |