AppMetrica: data export, post-processing, and visualization
- Prepare your cloud
- 1. Connect ClickHouse® and DataSphere
- 2. Retrieve and upload data to ClickHouse®
- 3. DataSphere. You will compare products by breadth and frequency of coverage
- 4. Connect DataLens and create charts
- 5. Create and configure a dashboard in DataLens
- 6. Customer journey. Create a QL chart and a Sankey chart
In this scenario, you will analyze user behavior in a mobile app based on AppMetrica data:
- Process the data using Python scripts in Jupyter Notebooks in Yandex DataSphere.
- Build charts and dashboards in Yandex DataLens.
- Compare products by breadth and frequency of coverage.
Customer journey is a sequence of user actions. User behavior analysis helps you find out how people use your product: which pages they visit, which functions they use, and where they run into problems. This information makes it easier for you to find the right solutions to develop your product.
As a data source, you'll use sampled and anonymized data from the auto.ru
Data architecture scheme
Tip
The script uses a file with pre-exported AppMetrica data so you can run the script without accessing the mobile app and AppMetrica.
For your own tasks, we recommend directly exporting data
First prepare the cloud and then explore and visualize the data step-by-step:
- Connect ClickHouse® and DataSphere
- Retrieve and upload data to ClickHouse®
- DataSphere. You will compare products by breadth and frequency of coverage
- Connect DataLens and create charts
- Create and configure a dashboard in DataLens
- Customer journey. Create a QL chart and a Sankey chart
Prepare your cloud
Sign up for Yandex Cloud and create a billing account:
- Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. - On the Yandex Cloud Billing
page, make sure you have a billing account linked and it has theACTIVE
orTRIAL_ACTIVE
status. If you do not have a billing account, create one.
If you have an active billing account, you can go to the cloud page
Learn more about clouds and folders.
1. Connect ClickHouse® and DataSphere
1.1. Connect ClickHouse®
-
In the management console
, select Managed Service for ClickHouse® from the list on the left. -
Click Create cluster.
-
Specify the settings for a ClickHouse® cluster.
-
Basic parameters: Enter
appmetrica_analysis
as the cluster name. -
Host class: Select burstable as the virtual machine type and b2.medium as the host type.
Warning
We do not recommend using burstable VM configurations in production environments. This tutorial uses them as an example. For production solutions, use standard or memory-optimized configurations.
-
Storage size: Leave the value at 10 GB.
-
Database: Enter
autoru_appmetrica
database name, as well as its username and password. Memorize these credentials. -
Hosts: Click the
icon. Enable Public access and click Save. -
Advanced settings: Enable 4 options:
- Access from DataLens
- Access from the management console
- Access from Yandex Metrica and AppMetrica
- Access from Serverless
-
After configuring all settings, click Create cluster.
-
1.2. Connect DataSphere
- Go to the management console
. - Select DataSphere from the list on the left.
- Click Create project.
- Enter
appmetrica_analysis
as the project name, and click Create. - Open the project. To do this, in the line with the project name, click
→ Open.
1.3. Clone the repository to DataSphere
-
In the top-left corner, click Git Clone:
. -
In the window that opens, enter the repository URI
https://github.com/firstsvet/yandex_appmetrika_cloud_case
, then click CLONE.
2. Retrieve and upload data to ClickHouse®
If you do not have a Yandex Metrica tag, it has not accumulated enough data, or if you want to make sure that you will get a result by completing all the guide steps, go to step 2.1 (skip step 2.2).
If you have the AppMetrica app and access to it, go to step 2.2: this is recommended for experienced users who might need to edit scripts (skip step 2.1).
2.1. DataSphere. Download the test app data via Yandex Disk
Note
Skip this step if you are using your own app data.
-
In the menu on the left, open the yandex_appmetrika_cloud_case folder → notebook 1.upload_data_from_yadisk.ipynb.
-
Complete all the steps (cells with code) in the notebook 1.upload_data_from_yadisk.ipynb.
To run the step, click the number to the left of the cell, then the run button at the top. The number will change to
[*]
. After the number appears again, run the next step.
2.2. Export the data from AppMetrica
To set up the connection and export data from your app, see Export data to Yandex Cloud
2.3. ClickHouse®. Get the cluster IP address
-
Go to the ClickHouse®
appmetrica_analysis
cluster that you created in step 1.1. Wait until the cluster status changes to Alive. Then open the cluster by clicking it. -
Select Hosts from the list on the left.
-
On the Overview tab, go to the Hostname column. To copy a hostname, point to the right of the hostname and click the copy icon.
2.4. DataSphere. Upload the data to ClickHouse®
-
Open the yandex_appmetrika_cloud_case folder → notebook 2. upload_data_to_ClickHouse®.ipynb.
-
Paste data in the variables:
-
In the yandex_appmetrika_cloud_case folder, create a new text file named
chpass.txt
. -
Enter the password of the logged-in user in the chpass.txt file. Save and close the file.
-
Complete all the steps (the cells with the code) in the notebook.
3. DataSphere. You will compare products by breadth and frequency of coverage
-
Open the yandex_appmetrika_cloud_case folder → Case_1.ipynb notebook.
-
Paste data in the variables:
-
Complete all the steps (the cells with the code) in the notebook.
-
View the intermediate results.
4. Connect DataLens and create charts
4.1. Connect to DataLens
- In the management console
, open the page of the ClickHouse® cluster you created. - On the left side of the window, select
DataLens. - Click Create connection.
4.2. Create a connection to ClickHouse® in DataLens
-
Fill in the connection settings:
-
Enter the name:
AppMetrica_workshop
. -
Select a ClickHouse® host from the Hostname drop-down list.
-
Select the username and enter the password from step 1.1.
-
Enable Allow subqueries in datasets and queries from charts.
-
Click Check connection.
-
-
When the connection check succeeds, click Create connection. In the window that opens, enter the connection name and click Create.
4.3. Create a dataset based on the connection
-
In the top-right corner, click Create dataset.
-
Select the
autoru_appmetrica.auto_data
table as the source. To do this, drag the table from the list on the left to the editing area. -
Open the Fields tab.
-
Create the users calculated field:
- In the top-right corner, click Add field.
- At the top left, enter the
users
field name. - Paste the
countd([appmetrica_device_id])
formula in the area to the right. - Click Create.
-
Repeat the previous step for other fields:
- reach, using the
COUNTD([appmetrica_device_id])/COUNTD([appmetrica_device_id] FIXED)
formula. - events, using the
COUNT([session_id])
formula. - events per user, using the
[events]/[users]
formula.
- reach, using the
-
In the top-right corner, click Save.
-
Name the dataset
autoru_backend_data
and click Create.
4.4. Create a chart: scatter chart
-
In the top-right corner, click Create chart.
-
Select Scatter chart as the type.
-
Drag the fields to the chart section:
- Drag the reach measure to the X section.
- Drag the events per user measure to the Y section.
- Drag the mark dimension to the Points section.
- Drag the event_name dimension to the Colors section.
-
In the top-right corner, click Save.
-
In the window that opens, enter
Coverage and events
as the chart name and click Save.
4.5. Create a chart: table
-
Select the Table type.
-
Drag the fields to the chart section:
- Drag the mark measure to the Columns section.
- Drag the users measure to the Columns section.
- Drag the users measure to the Sorting section.
-
In the top-right corner, click the button to the right of Save, then click Save as.
-
In the window that opens, enter
Table by car brands
as the chart name, then click Save.
5. Create and configure a dashboard in DataLens
-
Open the DataLens homepage
and click Create dashboard. -
Add a chart to your dashboard.
- In the top-right corner, click Add → Chart.
- From the Chart drop-down list, select Table by car brands. The Name field will be populated automatically.
- Click Add.
-
Repeat the previous step for the Coverage and events chart.
-
Add and configure a selector.
- In the top-right corner, click Add → Selector.
- In the Dataset list, select autoru_backend_data.
- In the Field list, select event name.
- In the Default value list, select any option.
- Click Add.
-
Position the charts and selector on the dashboard. To resize an element, drag it by the bottom-right corner.
-
Save the dashboard:
- In the top-right corner, click Save.
- Enter
auto.ru app
as the dashboard name, then click Create.
Try to change event name in the selector to another value to see how the dashboard changes.
6. Customer journey. Create a QL chart and a Sankey chart
6.1. Create a QL chart in DataLens
Use QL charts to delve into event sequences and experiment in DataLens.
-
Open the DataLens home page
and select Connections in the menu on the left. -
Select the
AppMetrica_workshop
connection that you created in step 4.2. -
At the top right, click Create QL chart.
-
Enter the query:
SELECT uniqExact(t.appmetrica_device_id) as counts, events_seq, if(events_seq like '%Call%', 'Call', if(events_seq like '%Message%', 'Message', 'Contact failed')) as contact FROM ( SELECT appmetrica_device_id, num_steps, arrayStringConcat(filt_events, ' -> ') as events_seq FROM (SELECT appmetrica_device_id, groupArray(event_name) as events, count(event_name) as cnt_events, groupArray(datetime) as times, arrayEnumerate(events) as indexes, arrayDifference(arrayMap(x -> toUInt64(x), times)) as times_diffs, arrayFilter(e, i -> (i = 1) or (events[i - 1] != events[i]) or (times_diffs[i] >= 1800), events, indexes) as filt_events, length(filt_events) as num_steps FROM (SELECT appmetrica_device_id, datetime, event_name FROM autoru_appmetrica.raw_appmetrica_auto_data ORDER BY appmetrica_device_id, datetime) GROUP BY appmetrica_device_id HAVING cnt_events <= 30)) as t where t.num_steps<10 GROUP BY t.events_seq HAVING counts>10 ORDER BY counts desc
-
Click Start.
-
Select the Bar chart type.
-
In the top-right corner, click Save, then enter the chart name:
Event chains
. -
Add the QL chart to the dashboard.
- In the menu on the left, click Dashboards.
- Select the auto.ru app dashboard from the list.
- Click Edit at the top right.
- Click Add → Chart.
- In the Chart list, select Event chains and click Add.
- Customize the dashboard layout and click Save.
6.2. Create a Sankey diagram in DataSphere
-
Go to the management console
. -
Select DataSphere from the list on the left.
-
Open the yandex_appmetrika_cloud_case folder → Case_2.ipynb notebook.
-
Complete all the steps (cells with code) in the notebook Case_2.ipynb.
-
You will get an interactive Sankey diagram that shows user behavior scenarios. You can move the chart blocks and save the result as an image.
ClickHouse® is a registered trademark of ClickHouse, Inc