Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataSphere
  • Getting started
    • All tutorials
      • Web analytics with funnels and cohorts calculated based on Yandex Metrica data
      • AppMetrica: data export, post-processing, and visualization
      • Analyzing data with Query
      • Working with data in Object Storage
      • Working with data in Managed Service for ClickHouse®
      • Working with data in Managed Service for PostgreSQL
      • Federated data queries
  • Terraform reference
  • Audit Trails events
  • Access management
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Prepare your cloud
  • 1. Connect ClickHouse® and DataSphere
  • 1.1. Connect ClickHouse®
  • 1.2. Connect DataSphere
  • 1.3. Clone the repository to DataSphere
  • 2. Retrieve and upload data to ClickHouse®
  • 2.1. DataSphere. Download the test app data via Yandex Disk
  • 2.2. Export the data from AppMetrica
  • 2.3. ClickHouse®. Get the cluster IP address
  • 2.4. DataSphere. Upload the data to ClickHouse®
  • 3. DataSphere. You will compare products by breadth and frequency of coverage
  • 4. Connect DataLens and create charts
  • 4.1. Connect to DataLens
  • 4.2. Create a connection to ClickHouse® in DataLens
  • 4.3. Create a dataset based on the connection
  • 4.4. Create a chart: scatter chart
  • 4.5. Create a chart: table
  • 5. Create and configure a dashboard in DataLens
  • 6. Customer journey. Create a QL chart and a Sankey chart
  • 6.1. Create a QL chart in DataLens
  • 6.2. Create a Sankey diagram in DataSphere
  1. Tutorials
  2. Data analytics
  3. AppMetrica: data export, post-processing, and visualization

AppMetrica: data export, post-processing, and visualization

Written by
Yandex Cloud
Updated at May 7, 2025
  • Prepare your cloud
  • 1. Connect ClickHouse® and DataSphere
    • 1.1. Connect ClickHouse®
    • 1.2. Connect DataSphere
    • 1.3. Clone the repository to DataSphere
  • 2. Retrieve and upload data to ClickHouse®
    • 2.1. DataSphere. Download the test app data via Yandex Disk
    • 2.2. Export the data from AppMetrica
    • 2.3. ClickHouse®. Get the cluster IP address
    • 2.4. DataSphere. Upload the data to ClickHouse®
  • 3. DataSphere. You will compare products by breadth and frequency of coverage
  • 4. Connect DataLens and create charts
    • 4.1. Connect to DataLens
    • 4.2. Create a connection to ClickHouse® in DataLens
    • 4.3. Create a dataset based on the connection
    • 4.4. Create a chart: scatter chart
    • 4.5. Create a chart: table
  • 5. Create and configure a dashboard in DataLens
  • 6. Customer journey. Create a QL chart and a Sankey chart
    • 6.1. Create a QL chart in DataLens
    • 6.2. Create a Sankey diagram in DataSphere

In this scenario, you will analyze user behavior in a mobile app based on AppMetrica data:

  • Process the data using Python scripts in Jupyter Notebooks in Yandex DataSphere.
  • Build charts and dashboards in Yandex DataLens.
  • Compare products by breadth and frequency of coverage.

Customer journey is a sequence of user actions. User behavior analysis helps you find out how people use your product: which pages they visit, which functions they use, and where they run into problems. This information makes it easier for you to find the right solutions to develop your product.

As a data source, you'll use sampled and anonymized data from the auto.ru mobile app, exported from AppMetrica.

Data architecture scheme

image

Tip

The script uses a file with pre-exported AppMetrica data so you can run the script without accessing the mobile app and AppMetrica.

For your own tasks, we recommend directly exporting data from AppMetrica to ClickHouse®.

First prepare the cloud and then explore and visualize the data step-by-step:

  1. Connect ClickHouse® and DataSphere
    1. Connect ClickHouse®
    2. Connect DataSphere
    3. Clone the repository to DataSphere
  2. Retrieve and upload data to ClickHouse®
    1. DataSphere. Download the test app data via Yandex Disk
    2. Export the data from AppMetrica
    3. ClickHouse®. Get the cluster's IP address
    4. DataSphere. Upload the data to ClickHouse®
  3. DataSphere. You will compare products by breadth and frequency of coverage
  4. Connect DataLens and create charts
    1. Connect to DataLens
    2. Create a connection to ClickHouse® in DataLens
    3. Create a dataset based on the connection
    4. Create a chart: scatter chart
    5. Create a chart: table
  5. Create and configure a dashboard in DataLens
  6. Customer journey. Create a QL chart and a Sankey chart
    1. Create a QL chart in DataLens
    2. Create a Sankey diagram in DataSphere

Prepare your cloudPrepare your cloud

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

1. Connect ClickHouse® and DataSphere1. Connect ClickHouse® and DataSphere

1.1. Connect ClickHouse®1.1. Connect ClickHouse®

  1. In the management console, select Managed Service for ClickHouse® from the list on the left.

  2. Click Create cluster.

  3. Specify the settings for a ClickHouse® cluster.

    1. Basic parameters: Enter appmetrica_analysis as the cluster name.

    2. Host class: Select burstable as the virtual machine type and b2.medium as the host type.

      Warning

      We do not recommend using burstable VM configurations in production environments. This tutorial uses them as an example. For production solutions, use standard or memory-optimized configurations.

      image

    3. Storage size: Leave the value at 10 GB.

    4. Database: Enter autoru_appmetrica database name, as well as its username and password. Memorize these credentials.

      image

    5. Hosts: Click the icon. Enable Public access and click Save.

    6. Advanced settings: Enable 4 options:

      • Access from DataLens
      • Access from the management console
      • Access from Yandex Metrica and AppMetrica
      • Access from Serverless
    7. After configuring all settings, click Create cluster.

1.2. Connect DataSphere1.2. Connect DataSphere

  1. Go to the management console.
  2. Select DataSphere from the list on the left.
  3. Click Create project.
  4. Enter appmetrica_analysis as the project name, and click Create.
  5. Open the project. To do this, in the line with the project name, click → Open.

1.3. Clone the repository to DataSphere1.3. Clone the repository to DataSphere

  1. In the top-left corner, click Git Clone: .

    image

  2. In the window that opens, enter the repository URI https://github.com/firstsvet/yandex_appmetrika_cloud_case, then click CLONE.

2. Retrieve and upload data to ClickHouse®2. Retrieve and upload data to ClickHouse®

If you do not have a Yandex Metrica tag, it has not accumulated enough data, or if you want to make sure that you will get a result by completing all the guide steps, go to step 2.1 (skip step 2.2).

If you have the AppMetrica app and access to it, go to step 2.2: this is recommended for experienced users who might need to edit scripts (skip step 2.1).

2.1. DataSphere. Download the test app data via Yandex Disk2.1. DataSphere. Download the test app data via Yandex Disk

Note

Skip this step if you are using your own app data.

  1. In the menu on the left, open the yandex_appmetrika_cloud_case folder → notebook 1.upload_data_from_yadisk.ipynb.

  2. Complete all the steps (cells with code) in the notebook 1.upload_data_from_yadisk.ipynb.

    To run the step, click the number to the left of the cell, then the run button at the top. The number will change to [*]. After the number appears again, run the next step.

    image

2.2. Export the data from AppMetrica2.2. Export the data from AppMetrica

To set up the connection and export data from your app, see Export data to Yandex Cloud.

2.3. ClickHouse®. Get the cluster IP address2.3. ClickHouse®. Get the cluster IP address

  1. Go to the ClickHouse® appmetrica_analysis cluster that you created in step 1.1. Wait until the cluster status changes to Alive. Then open the cluster by clicking it.

    image

  2. Select Hosts from the list on the left.

  3. On the Overview tab, go to the Hostname column. To copy a hostname, point to the right of the hostname and click the copy icon.

2.4. DataSphere. Upload the data to ClickHouse®2.4. DataSphere. Upload the data to ClickHouse®

  1. Open the yandex_appmetrika_cloud_case folder → notebook 2. upload_data_to_ClickHouse®.ipynb.

  2. Paste data in the variables:

    • Host name from step 2.3: In the CH_HOST_NAME variable.

    • Username from step 1.1: In the CH_USER variable.

    • Database name from step 1.1: In the CH_DB_NAME variable.

      image

  3. In the yandex_appmetrika_cloud_case folder, create a new text file named chpass.txt.

    image

  4. Enter the password of the logged-in user in the chpass.txt file. Save and close the file.

  5. Complete all the steps (the cells with the code) in the notebook.

3. DataSphere. You will compare products by breadth and frequency of coverage3. DataSphere. You will compare products by breadth and frequency of coverage

  1. Open the yandex_appmetrika_cloud_case folder → Case_1.ipynb notebook.

  2. Paste data in the variables:

    • Host name from step 2.3: In the CH_HOST_NAME variable.
    • Username from step 1.1: In the CH_USER variable.
    • Database name from step 1.1: In the CH_DB_NAME variable.
  3. Complete all the steps (the cells with the code) in the notebook.

  4. View the intermediate results.

    image

4. Connect DataLens and create charts4. Connect DataLens and create charts

4.1. Connect to DataLens4.1. Connect to DataLens

  1. In the management console, open the page of the ClickHouse® cluster you created.
  2. On the left side of the window, select DataLens.
  3. Click Create connection.

4.2. Create a connection to ClickHouse® in DataLens4.2. Create a connection to ClickHouse® in DataLens

  1. Fill in the connection settings:

    1. Enter the name: AppMetrica_workshop.

    2. Select a ClickHouse® host from the Hostname drop-down list.

    3. Select the username and enter the password from step 1.1.

    4. Enable Allow subqueries in datasets and queries from charts.

      image

    5. Click Check connection.

  2. When the connection check succeeds, click Create connection. In the window that opens, enter the connection name and click Create.

4.3. Create a dataset based on the connection4.3. Create a dataset based on the connection

  1. In the top-right corner, click Create dataset.

  2. Select the autoru_appmetrica.auto_data table as the source. To do this, drag the table from the list on the left to the editing area.

  3. Open the Fields tab.

  4. Create the users calculated field:

    1. In the top-right corner, click Add field.
    2. At the top left, enter the users field name.
    3. Paste the countd([appmetrica_device_id]) formula in the area to the right.
    4. Click Create.

    image

  5. Repeat the previous step for other fields:

    • reach, using the COUNTD([appmetrica_device_id])/COUNTD([appmetrica_device_id] FIXED) formula.
    • events, using the COUNT([session_id]) formula.
    • events per user, using the [events]/[users] formula.
  6. In the top-right corner, click Save.

  7. Name the dataset autoru_backend_data and click Create.

4.4. Create a chart: scatter chart4.4. Create a chart: scatter chart

  1. In the top-right corner, click Create chart.

  2. Select Scatter chart as the type.

  3. Drag the fields to the chart section:

    • Drag the reach measure to the X section.
    • Drag the events per user measure to the Y section.
    • Drag the mark dimension to the Points section.
    • Drag the event_name dimension to the Colors section.

    image

  4. In the top-right corner, click Save.

  5. In the window that opens, enter Coverage and events as the chart name and click Save.

4.5. Create a chart: table4.5. Create a chart: table

  1. Select the Table type.

  2. Drag the fields to the chart section:

    • Drag the mark measure to the Columns section.
    • Drag the users measure to the Columns section.
    • Drag the users measure to the Sorting section.

    image

  3. In the top-right corner, click the button to the right of Save, then click Save as.

  4. In the window that opens, enter Table by car brands as the chart name, then click Save.

5. Create and configure a dashboard in DataLens5. Create and configure a dashboard in DataLens

  1. Open the DataLens homepage and click Create dashboard.

  2. Add a chart to your dashboard.

    1. In the top-right corner, click Add → Chart.
    2. From the Chart drop-down list, select Table by car brands. The Name field will be populated automatically.
    3. Click Add.

    image

  3. Repeat the previous step for the Coverage and events chart.

  4. Add and configure a selector.

    1. In the top-right corner, click Add → Selector.
    2. In the Dataset list, select autoru_backend_data.
    3. In the Field list, select event name.
    4. In the Default value list, select any option.
    5. Click Add.
  5. Position the charts and selector on the dashboard. To resize an element, drag it by the bottom-right corner.

  6. Save the dashboard:

    1. In the top-right corner, click Save.
    2. Enter auto.ru app as the dashboard name, then click Create.

    image

Try to change event name in the selector to another value to see how the dashboard changes.

6. Customer journey. Create a QL chart and a Sankey chart6. Customer journey. Create a QL chart and a Sankey chart

6.1. Create a QL chart in DataLens6.1. Create a QL chart in DataLens

Use QL charts to delve into event sequences and experiment in DataLens.

  1. Open the DataLens home page and select Connections in the menu on the left.

  2. Select the AppMetrica_workshop connection that you created in step 4.2.

  3. At the top right, click Create QL chart.

  4. Enter the query:

    SELECT uniqExact(t.appmetrica_device_id) as counts, events_seq, 
    
    if(events_seq like '%Call%', 'Call', 
    if(events_seq like '%Message%', 'Message', 'Contact failed')) as contact
    
    FROM (
    
    SELECT
        appmetrica_device_id,
        num_steps,
        arrayStringConcat(filt_events, ' -> ') as events_seq
    FROM
        (SELECT
            appmetrica_device_id,
            groupArray(event_name) as events,
            count(event_name) as cnt_events,
            groupArray(datetime) as times,
            arrayEnumerate(events) as indexes,
            arrayDifference(arrayMap(x -> toUInt64(x), times)) as times_diffs,
            arrayFilter(e, i -> (i = 1) or (events[i - 1] != events[i]) or (times_diffs[i] >= 1800),
                        events, indexes) as filt_events,
            length(filt_events) as num_steps
        FROM 
            (SELECT
                appmetrica_device_id,
                datetime,
                event_name
            FROM autoru_appmetrica.raw_appmetrica_auto_data
            ORDER BY appmetrica_device_id,
                datetime)
        GROUP BY appmetrica_device_id
        HAVING cnt_events <= 30)) as t
    
    where  t.num_steps<10
    
    GROUP BY t.events_seq
    HAVING counts>10
    ORDER BY counts desc
    
  5. Click Start.

  6. Select the Bar chart type.

    image

  7. In the top-right corner, click Save, then enter the chart name: Event chains.

  8. Add the QL chart to the dashboard.

    1. In the menu on the left, click Dashboards.
    2. Select the auto.ru app dashboard from the list.
    3. Click Edit at the top right.
    4. Click Add → Chart.
    5. In the Chart list, select Event chains and click Add.
    6. Customize the dashboard layout and click Save.

    image

6.2. Create a Sankey diagram in DataSphere6.2. Create a Sankey diagram in DataSphere

  1. Go to the management console.

  2. Select DataSphere from the list on the left.

  3. Open the yandex_appmetrika_cloud_case folder → Case_2.ipynb notebook.

  4. Complete all the steps (cells with code) in the notebook Case_2.ipynb.

  5. You will get an interactive Sankey diagram that shows user behavior scenarios. You can move the chart blocks and save the result as an image.

    image

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Web analytics with funnels and cohorts calculated based on Yandex Metrica data
Next
Analyzing data with Query
© 2025 Direct Cursus Technology L.L.C.