Web analytics with funnels and cohorts calculated based on Yandex Metrica data
- Getting started
- 1. Connect ClickHouse® and DataSphere
- 2. Retrieve and upload data to ClickHouse®
- 3. Connect DataLens and create charts
- 4. Create and configure a dashboard in DataLens
- 5. Build conversion funnels
- 6. Perform cohort analysis
- 6.1. DataSphere. Perform cohort analysis
- 6.2. DataLens. Create a dataset and a chart with cohort visualization
- 6.3. DataLens. Configure a chart with cohort visualization
- 6.4. DataLens. Create a chart with retention
- 6.5. DataLens. Add charts to a new dashboard tab
- 6.6. DataLens. Create charts
- 6.7. DataLens. Add charts to the dashboard
- How to delete the resources you created
Warning
The Serverlesss mode will be discontinued on March 1, 2024.
In this tutorial, you will learn how to build conversion funnels, run cohort analysis, calculate the Retention rate for the user base in Yandex DataSphere, and visualize the data in Yandex DataLens.
Yandex Metrica data is used as the data source.
- Connect ClickHouse® and DataSphere:
- Retrieve and upload data to ClickHouse®:
- Connect DataLens and create charts:
- Create and configure a dashboard in DataLens:
- Build conversion funnels:
- Perform cohort analysis:
- DataSphere. Perform cohort analysis.
- DataLens. Create a dataset and a chart with cohort visualization.
- DataLens. Configure a chart with cohort visualization.
- DataLens. Create a chart with retention.
- DataLens. Add charts to a new dashboard tab.
- DataLens. Create charts.
- DataLens. Add charts to the dashboard.
If you no longer need the resources you created, delete them.
Getting started
Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.
- On the DataSphere home page
, click Try for free and select an account to log in with: Yandex ID or your working account in the identity federation (SSO). - Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
- Create a community.
- Link your billing account to the DataSphere community you are going to work in. Make sure that you have a billing account linked and its status is
ACTIVE
orTRIAL_ACTIVE
. If you do not have a billing account yet, create one in the DataSphere interface.
Tip
To make sure Yandex DataLens and Yandex DataSphere can run within the Yandex Cloud network, create their instances in the same organization.
Required paid resources
The cost of the infrastructure deployment includes:
- Fee for the cluster computing resources and storage (see Managed Service for ClickHouse® pricing).
- Fee for the computation time (see DataSphere pricing).
- Fee for the outbound traffic (see Virtual Private Cloud pricing).
1. Connect ClickHouse® and DataSphere
1.1. Connect ClickHouse®
- In the management console
, select a folder to create a ClickHouse® cluster in. - Select Managed Service for ClickHouse.
- In the window that opens, click Create ClickHouse cluster.
- Specify the settings for a ClickHouse® cluster:
-
Under Basic parameters, specify a name for the cluster.
-
Under Resources, select the
Intel Cascade Lake
platform, theburstable
type, and theb2.medium
host type.Warning
We do not recommend using
burstable
VM configurations in production environments. This tutorial uses them as an example. For production solutions, usestandard
ormemory-optimized
configurations. -
Under Size of storage, keep the value of
10 GB
. -
Under Hosts, click
. Enable the Public access option and click Save. -
Under DBMS settings, disable user management via SQL, enter the username, password, and database name (e.g.,
metrica_data
). -
Under Service settings, enable the following options:
- DataLens access
- Access from the management console
- Access from Metrica and AppMetrica
- Serverless access
-
Click Create cluster.
-
1.2. Connect DataSphere
-
Open the DataSphere home page
. -
In the left-hand panel, select
Communities. -
Select the community to create a project in.
-
On the community page, click
Create project. -
In the window that opens, enter a name and description (optional) for the project. The naming requirements are as follows:
- The name must be from 3 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- The first character must be a letter and the last character cannot be a hyphen.
-
Click Create.
-
Click Open project in JupyterLab.
This is the JupyterLab development environment, and you are going to use it to complete the next steps.
1.3. Clone the repository to DataSphere
- In the Git menu, select Clone.
- In the window that opens, specify the URI of the
repositoryhttps://github.com/zhdanchik/yandex_metrika_cloud_case.git
and click CLONE. - Click OK.
2. Retrieve and upload data to ClickHouse®
If you do not yet have a Yandex Metrica tag or it has not accumulated enough data, or if you want to be sure that you get a result by completing all steps in the tutorial, go to step 2.3 (skip steps 2.1 and 2.2).
If you have a Yandex Metrica tag and can access it, go to step 2.1 and 2.2 (skip step 2.3). We recommend walking through these steps if you are an experienced user because the logic of calculating funnels and cohorts depends on the data itself, and you may need to tweak the scripts.
2.1. Yandex Metrica. Create an app and get an access token
-
To work with the API, get your OAuth token
. -
Create an app:
- Go to https://oauth.yandex.ru/client/new
. - Enter a name for the service.
- Go to Platforms → Web services. In the Redirect URI field, enter
https://oauth.yandex.com/verification_code
. - Under Access to data, enter
metrika
and select Access to statistics and ability to view all counter settings (metrika:read). - Click Create app.
- In the window that opens, a description of the application will appear. Save the ClientID of your app.
- Go to https://oauth.yandex.ru/client/new
-
Click
https://oauth.yandex.ru/authorize?response_type=token&client_id=<app_ID>
. Paste the ClientID of your app as<app_ID>
. -
Click Log in as.
-
Save the received access token.
2.2. DataSphere. Upload data via the Yandex Metrica Logs API
-
In the DataSphere project, in the root of the working directory, create a text file. To do this, click Text File in the workspace.
-
Name the file
.yatoken.txt
and paste the obtained access token to the file contents. Save the changes and close the file. -
Open the yandex_metrika_cloud_case folder → 1a. get_data_via_logs_api.ipynb notebook.
-
Use your Yandex Metrica tag ID as the value of the
COUNTER_ID
variable. You can find your Yandex Metrica tag ID on the My tags page. -
Enter the start date for the analyzed period in the
START_DATE
variable. -
Enter the end date for the analyzed period in the
END_DATE
variable.Warning
The date range will NOT include the end date. For example, to get data up to December 5, 2022, paste
2022-12-06
into theEND_DATE
variable -
Complete all steps (cells with code) in the 1a. get_data_via_logs_api.ipynb notebook.
If you could not get data for the demo tag from the Logs API, you can download it via Yandex Disk.
2.3. DataSphere. Download the test tag data via Yandex Disk
Note
Skip this section if you are using your own tag data.
- Open the yandex_metrika_cloud_case folder → 1b. get_data_via_yadisk.ipynb notebook.
- Complete all steps (cells with the code) in the 1b. get_data_via_yadisk.ipynb notebook.
2.4. ClickHouse®. Get the cluster IP address
- In the management console
, go to the created ClickHouse® cluster. Wait until the cluster status changes toAlive
. Then open the cluster by clicking it. - Select
Hosts from the list on the left. - On the Overview tab, copy the host name.
2.5. DataSphere. Upload the data to ClickHouse®
-
Open the yandex_metrika_cloud_case folder → 2. upload_data_to_ClickHouse®.ipynb notebook:
- Paste the copied host name into the
CH_HOST_NAME
variable. - In the
CH_USER
variable, insert the name of the user you specified when creating your ClickHouse® cluster. - In the
CH_DB_NAME
variable, insert the database name you specified when creating your ClickHouse® cluster.
- Paste the copied host name into the
-
In the root directory, create a new text file named
.chpass.txt
. -
In the
.chpass.txt
file, paste the user password you specified when creating your ClickHouse® cluster. Save and close the file. -
Complete all the steps (the cells with the code) in the notebook.
3. Connect DataLens and create charts
3.1. Connect to DataLens
- In the management console
, open the page of the ClickHouse® cluster you created. - On the left side of the window, select
DataLens. - Click Create connection.
3.2. Create a connection to ClickHouse® in DataLens
-
Fill in the connection settings:
- Select a cluster from the Cluster drop-down list or create a new one. If the cluster is missing in the list, click Specify manually, then specify the ClickHouse® cluster name.
- Select a ClickHouse® host from the Host name drop-down list.
- Select the username.
- Enter the password and click Check connection.
-
When the connection check succeeds, click Create connection. In the window that opens, enter the connection name and click Create.
3.3. Create a dataset based on the connection
- In the top-right corner, click Create dataset.
- Select the
metrica_data.hits
table as a source. To do this, drag the table from the list on the left to the editing area. - Open the Fields tab.
- In the top-right corner, click
Add field. - To calculate the number of hits, create a calculated field: name it
Hits
, enter1
in the workspace, and click Create. - For the
Hits
field, select the Amount value in the Aggregation column. - Rename the
Browser
field toBrowser
. - In the top-right corner, click Save.
- Name the dataset
ch_metrica_data_hits
and click Create.
3.4. Create a stacked area chart
- In the top-right corner, click Create chart.
- In the window that opens, drag the following fields to the chart section:
EventDate
, to the X section.Browser
, to the Colors section.Hits
, to the Y section.
- Change the chart type from Column chart to Stacked area chart.
- Click Save.
- In the window that opens, enter
ch_metrica_data_hits_area
as the chart name and click Save.
3.5. Create a pivot table chart
- In the top-right corner, click
→ Save as copy. - For the chart copy, enter
ch_metrica_data_hits_table
as the new name and click Save. - Select Pivot table as the new chart type.
- Add or drag the following fields to the chart area:
Browser
, to the Rows section.Hits
, to the Sorting section.
- Click Save.
4. Create and configure a dashboard in DataLens
4.1. Create a dashboard
-
In the left-hand panel, select
Dashboards and click Create dashboard. -
Add the first chart to the dashboard. To do this, in the top-right corner, click Add
→ Chart:- From the Chart drop-down list, select
ch_metrica_data_hits_area
. - In the Name field, enter Hits by browser as the chart name and click Add.
- From the Chart drop-down list, select
-
Similarly, add the
ch_metrica_data_hits_table
chart named Hits by browser for period. -
Move the charts and resize them on the dashboard:
- Drag the table chart to the right of the diagram chart.
- To change the vertical dimensions of the charts, drag them by the bottom-right corner.
-
Save the dashboard:
- In the top-right corner, click Save.
- Enter
ch_metrica_data
as the dashboard name and click Create.
4.2. Set up a dashboard
- Add filtering to select a specific browser. To do this, in the top-right corner, click Add
→ Chart. - You can add the selector to a field from any dataset. From the Dataset list, select the
ch_metrica_data_hits
dataset you created. - In the Field list, select
Browser
. - Enable the Multiple choice option.
- In the Default value field, select browsers:
android_browser
chrome
chromemobile
firefox
opera
safari
safari_mobile
samsung_internet
yandex_browser
yandexsearch
- In the Name field, enter a name for the selector and enable the option.
- Click Add.
- Drag the selector to the top of the dashboard and stretch it horizontally.
- In the top-right corner, click Save.
5. Build conversion funnels
5.1. DataSphere. Build funnels
- Open the DataSphere home page
. - Open the 3. funnels.ipynb notebook. Specify the host, the user, and the DB name.
- Run the cells and evaluate the analysis results.
In ClickHouse®, themetrica_data.funnels_by_bro
table will be created, where funnels are calculated by browser.
5.2. DataLens. Funnels by browser. Create a dataset
Create a new dataset based on the new table and the connection to ClickHouse®:
- Open the DataLens
home page (or click DataLens in the left-hand panel) and click Create dataset. - Go to the Connections section and click
Add. - From the list of connections, select the connection name that you created in Step 3.2.
- Drag the new
metrica_data.funnels_by_bro
table to the editing area. - Open the Fields tab:
- Rename the
step X
fields toStep X
, where X is the step number. - Specify the Sum value in the Aggregation column for the
Step X
fields and click Save.
- Rename the
- Name the dataset
ch_metrica_data_funnels_by_bro
and click Create.
5.3. DataLens. Funnels by browser. Create a chart
Create a chart based on the ch_metrica_data_funnels_by_bro
dataset:
- Click Create chart.
- Select the Pivot table chart type.
- Drag the fields to the chart sections:
Browser
, to the Rows section.Step X
, to the Measures section, where X is the step sequence number.Step 1
, to the Sorting section.
- Click Save.
- Enter
ch_metrica_data_funnels_by_bro_table
as the chart name and click Save.
5.4. DataLens. Funnels by browser. Add a chart to your dashboard
- Go to the created dashboard from the dashboards
page. - Add a new chart. In the top-right corner, click Edit:
- Click Add
→ Chart. - From the Chart drop-down list, select the
ch_metrica_data_funnels_by_bro_table
chart. - In the Name field, enter
Funnels by browser
as the chart name and click Add.
- Click Add
- Place the new chart to the right of the existing two. Stretch the chart so that it matches the others vertically and reaches the right border of the page.
- Click Save.
5.5. DataLens. Funnels by browser. Set up a dashboard
Configure relationships so that the selector affects the new chart from another dataset:
- Click Edit → Links.
- In the window that opens, select the
Browser
selector from the list. - On the page with the other dashboard elements, scroll down to the
Funnels by browser
chart, and click on the list with the link. - Select the link type: Outgoing link.
- From each list, select the fields for the
Browser
link. Click Add. - Click Save.
- In the top-left corner, click
→ Rename. - Enter
Supermarket.ru — funnel and cohort analysis
as the name. Click Done.
6. Perform cohort analysis
6.1. DataSphere. Perform cohort analysis
- Open the 4. cohorts.ipynb notebook. Specify the host, the user, and the DB name.
- Run the cells and evaluate the analysis results.
In ClickHouse®, the metrica_data.retention_users
table will be created with all the data required to render visualization in DataLens.
6.2. DataLens. Create a dataset and a chart with cohort visualization
Create a new dataset based on the new table and the connection to ClickHouse®:
- Open the DataLens
home page and click Create dataset. - In the Connections section, click Create dataset and then click
Add. - From the list, select the connection you created.
- Drag the new
metrica_data.retention_users
table to the workspace to connect to it. - Open the Fields tab and create a calculated field named
week_num
, which is equal to([date]-[min_date])/7
.
This field will indicate the number of weeks from the user's first visit. - Click Create.
- For the
visits
,purchases
, andrevenue
fields, select the Sum value in the Aggregation column. - Rename the fields to
Visits
,Purchases
, andRevenue
, respectively. - Save the dataset:
- Name the dataset
ch_metrica_data_users_visits
. - Click Create.
- Name the dataset
- Create a new chart based on the dataset:
- Change the chart type to Pivot table.
- Drag the
week_num
field to the Columns section. - Drag the
min_date
field to the Rows section. - Drag the
Visits
field to the Measures section.
6.3. DataLens. Configure a chart with cohort visualization
Filter out incomplete weeks of June 29, 2020 and September 28, 2020:
- Drag the
min_date
field to the Filters section.- In the window that opens, select the start and end dates of the date range for filtering:
- Start date:
June 06, 2020
- End date:
September 27, 2020
- Start date:
- Click Apply filter.
- In the window that opens, select the start and end dates of the date range for filtering:
- Format the numbers in the values of the
week_num
field by removing the decimal places. To do this, in the Rows section, in theweek_num
field, click the icon. In the window that opens, set the following configuration:- Set the Decimal place measure to
0
. - Set the Show delimiter measure to
Hide
. - Click Apply.
- Set the Decimal place measure to
- To color the table, add the
Visits
field to the Colors section and click the icon. In the window that opens, configure the colors:- Select Gradient type:
3 point
. - Select Color:
Orange-Violet-Blue
. - Enable Set threshold values and specify the values:
100
,1000
, and5000
. - Click Apply.
- Select Gradient type:
- Click Save.
- Name the chart
ch_metrica_data_users_visits_cohorts_abs
and click Save.
6.4. DataLens. Create a chart with retention
Create a chart with retention based on the ch_metrica_data_users_visits_cohorts_abs
chart. You can open the chart from the dashboard or find it in the chart list
- In the top-right corner, click
→ Save as copy. - Enter
ch_metrica_data_users_visits_cohorts_rel
as the name of the chart and click Save. - Create a calculated field to calculate the retention rate relative to the first week:
- On the left side of the screen, click
above the list of dataset fields and select Field. - Name the field
Visits from the first week
. - Paste the following formula:
SUM([Visits])/RMAX(SUM([Visits]) among [week_num])
. - Click Create.
- On the left side of the screen, click
- Drag the
Visits from the first week
field to the Measures section. - Drag the
Visits from the first week
field to the Colors section in place of theVisits
field. - Select the format for
Visits from the first week
. To do this, click the icon under Measures in theVisits from the first week
field. In the window that opens, set the following configuration:- Set Format to
Percent
. - Click Apply.
- Set Format to
- Edit the threshold values for the measure colors. Under Colors, click the
icon. In the window that opens, enable Set threshold values, then specify the threshold values of0.01
,0.025
, and0.1
and click Apply. - Click Save.
6.5. DataLens. Add charts to a new dashboard tab
- In the left-hand panel, click
Dashboards and open the dashboard. - Click Edit → Tabs.
- Rename the existing tab
Overview + Funnels
. - Add a new tab and name it
Cohorts
. Click Save. - Go to the new
Cohort
tab:- Add the
ch_metrica_data_users_visits_cohorts_abs
chart to the dashboard. - In the Name field, specify
Visits by cohort (absolute)
.
- Add the
- To add a new tab, click Add on the left:
- In the new tab, add the
ch_metrica_data_users_visits_cohorts_rel
chart. - Enter
Visits by cohort (relative)
as the name. - Click Add.
- Click Save.
- In the new tab, add the
Now you have a chart with two switchable tabs.
6.6. DataLens. Create charts
Create a new chart based on the ch_metrica_data_users_visits_cohorts_abs
chart. You can open the chart from the dashboard or find it in the chart list
- In the top-right corner, click
→ Save as copy. - Enter
ch_metrica_data_users_revenue_cohorts_abs
as the chart name and click Save. - Drag the
Revenue
field to the Measures and Colors sections on top of theVisits
field. - In the
Revenue
section, click the icon. Change the field formatting:- Select
1
decimal place. - Select the
Millions, M
scale. - Change the color thresholds for the new field to
500000
,1500000
, and10000000
.
- Select
- Save the chart.
Create another chart based on the ch_metrica_data_users_visits_cohorts_rel
chart:
- In the top-right corner, click
→ Save as copy. - Enter
ch_metrica_data_users_revenue_cohorts_rel
as the name of the chart and click Save. - Change the
Visits from the first week
field:- Rename the field
Revenue from the first week
. - Change the formula to
SUM([Revenue])/RMAX(SUM([Revenue]) among [week_num])
. - Change the color thresholds for the new field to
0.01
,0.2
, and0.3
.
- Rename the field
- Save the chart.
6.7. DataLens. Add charts to the dashboard
Add charts with cohort visualization to the dashboard:
- Click Edit.
- Click Add
→ Chart. - Select
ch_metrica_data_users_revenue_cohorts_abs
from the chart list. - Enter
Revenue by cohort (absolute)
as the name. - Use the
Add button to create a new tab:- In the new tab, select
ch_metrica_data_users_revenue_cohorts_rel
from the chart list. - Enter
Revenue by cohort (relative)
as the name. - In the top-right corner, click Save.
- In the new tab, select
- Arrange the charts side by side.
How to delete the resources you created
To stop paying for the resources you created, delete the cluster.
ClickHouse® is a registered trademark of ClickHouse, Inc