Visualizing Yandex Object Storage data in Yandex DataLens
By integrating Yandex Query with Yandex DataLens, you can visualize data stored in Yandex Object Storage. DataLens generates a YQL query, while Yandex Query runs the query and returns the results, which are visualized using charts.
For the solution architecture, see below.
As an example, let's visualize how the time of day affects the number and average cost of New York City yellow taxi trips. The data was pre-uploaded to a public Yandex Object Storage bucket named yq-sample-data
, the nyc_taxi_csv
folder.
Note
Yandex Cloud provides the New York City taxi trips dataset as is. Yandex Cloud makes no representations, express or implied, warranties, or conditions pertaining to your use of the specified dataset. To the extent allowed by your local laws, Yandex Cloud shall not be liable for any loss or damage, including direct, consequential, special, indirect, incidental, or exemplary, resulting from your use of the dataset.
NYC Taxi and Limousine Commission (TLC):
The data was collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The taxi trip data is not generated by the TLC, and the TLC makes no representations whatsoever about the accuracy of this data.
Take a look at the Dataset source
To visualize and explore data, set up your cloud and follow the steps below:
- Connect to data in Object Storage.
- Create a connection in Yandex DataLens.
- Configure the dataset fields.
- Configure visualization.
Getting started
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
Note
To create a DataLens connection to Yandex Query, you need a service account with the editor
role for the folder where the Object Storage bucket connection will be created.
Connect to data in Object Storage
-
In the management console
, select the folder where you want to create a connection. -
In the list of services, select Yandex Query.
-
In the left-hand panel, select
Tutorial. -
Click Create connection. In the Create connection section, all parameter fields are pre-filled. Enter a description (optional) and click Create. This will take you to the screen for creating a data binding.
-
Enter the data binding description (optional) and click Create. Appropriate training resources will be created.
Create a connection in Yandex DataLens
To create a Yandex Query connection:
-
Go to the DataLens home page
. -
In the left-hand panel, select
Connections and click Create connection. -
Select a Yandex Query connection.
-
Specify the connection parameters:
- Cloud and folder: Select the folder where your service account is located.
- Service account: Select an existing service account or create a new one.
- Cache TTL in seconds: Specify the cache time-to-live or leave the default value.
- Raw SQL level: Select Allow subqueries in datasets and queries from charts.
-
Click Create connection.
-
Enter a connection name, e.g.,
yq-connect
, and click Create.
Configure the dataset fields
-
In the top-right corner of the page where you created the connection, click Create dataset.
-
In the window that opens, enter the query text provided below and click Create.
SELECT * FROM bindings.`tutorial-analytics`
If everything is set up correctly, the Preview section in the lower part of the screen will contain the following data:
-
Create a calculated field with the pick-up time data:
-
Go to the Fields tab and click Add field.
-
In Field name, specify
hour_trip
. -
In the formula line, enter:
HOUR([tpep_pickup_datetime])
-
Click Create. The new
hour_trip
field will appear in the list of dataset fields.
-
-
In the Aggregation column, select the
Average
aggregation type for the total_amount field. -
In the top-right corner, click Save. Enter the dataset name,
yellow-taxi-dataset
, and click Create. -
When the dataset is saved, click Create chart in the top-right corner.
Configure visualization
To visualize and analyze the data, use a combined chart.
-
For the visualization type, choose Combined chart.
-
Drag the
hour_trip
dimension to the X section. In the X section, the dimension will be common for all layers. -
For the first layer, choose Bar chart.
-
Drag the
total_amount
measure to the Y section. You will see a bar chart in the visualization area. -
Add a layer. To do this, click the
icon to the right of the first layer name. -
For the second layer, choose Line chart.
-
Add a field to the chart:
-
At the top left, click
and select Field. -
In the window that opens:
-
Under Field name, enter
count_trip
. -
In the formula line, enter
COUNT()
.
-
-
Click Create.
-
-
Drag the new
count_trip
measure to the Y2 section. You will see a line chart on top of the first chart in the visualization area. -
In the top-right corner, click Save. Enter
yellow-taxi-combo-chart
as the chart name and click Save.
The chart you created shows how the time of day affects the number of trips (line chart) and the average trip cost (bar chart).
In the same way, you can create additional charts and dashboards or share the results with your teammates.
Visualizing Yandex Object Storage data with Yandex Query: