Joining Yandex DataLens data
Yandex DataLens uses a connection to retrieve data from a source (DB, CSV, Yandex Metrica, and so on). You can create datasets, charts, and selectors from the connection data. If the source has multiple tables, you can combine them to build the required set of data. You can link data from different datasets at the chart level or through selector links.
Data joining methods
You can use different data joining methods:
Dataset level
To join data at the dataset level, you can add tables to the workspace or write an SQL query.
Warning
You cannot join data from different sources at a single dataset level.
Adding tables
You can join data through the dataset creation interface by dragging tables to the workspace and configuring links between them using the JOIN operator. Check out this scenario for a use case of joining data by adding tables.
Note
When you merge data from multiple tables, a dataset will get duplicates of the fields used to create links. You can remove them manually from the list of dataset fields. You can also describe the dataset using an SQL query and exclude duplicate fields from the final set.
SQL query
In a dataset, you can add an ad-hoc SQL query to the data source. When accessing a data source, the query code is run as a subquery. You can use the query output as the final dataset or join it with other source tables via the interface.
Chart level
DataLens enables you to join data at the chart level. To join data at the chart level, you can use an SQL chart or multi-dataset charts.
QL chart
QL charts are charts created from a connection if there is a database at the other end of the connection. Running a SQL query does not create a separate dataset object; instead, it generates one on the fly and displays it in the preview panel. For more information, see Creating a QL chart.
Multi-dataset charts
Multi-dataset charts are charts visualizing data from different datasets. Queries for each dataset are processed independently of each other. For more information, see Creating a multi-dataset chart.
Selector link level
You can add a selector to a dashboard to modify query output in its associated widgets:
- On the dashboard, selectors and charts built from a single dataset get linked automatically.
- Selectors and charts built from different datasets can be linked manually using aliases.
Before creating a link, make sure the field used by the selector as a filter is included in the dataset the chart is built from. Otherwise, the link will not work. For more information, see Creating an alias.
Optimizing data when joining tables
In some charts based on a dataset with joined tables, you can only use fields from a single table. In this case, DataLens optimizes a query to a source. The JOIN
operator is not used, and the query returns data only from one table without filtering data based on others. This allows reducing the amount of requested data and the query execution time. However, data returned by an optimized query may differ from that you expect.
Optimization is used under the following conditions:
- The Optimize link option is enabled in the table link settings.
- A chart only uses fields of one of the joined tables.
- There are no fields from the other tables in any of the chart sections.
- Fields from the other tables are not used in the chart's calculated fields.
Note
Optimization will not work if a dataset is described using an SQL query to the source.
Let's look at examples of query optimization using different datasets for a source with tables:
Employees
id | name | department_id |
---|---|---|
1 | Ivanov | 2 |
2 | Petrov | 4 |
3 | Sidorov | 1 |
4 | Stepanov | 1 |
5 | Sokolov | |
6 | Orlova | 3 |
7 | Shishkina | 3 |
8 | Semyonov | |
9 | Antonova | 3 |
10 | Sergeev | 4 |
Departments
id | name |
---|---|
1 | Logistics |
2 | IT |
3 | Accounting |
4 | Security |
Bonuses
employee_id | bonus |
---|---|
1 | 35,000 |
2 | 40,000 |
5 | 28,000 |
7 | 30,000 |
9 | 50,000 |
Qualification
employee_id | category |
---|---|
2 | Category 1 |
4 | Category 1 |
5 | Category 2 |
6 | Category 3 |
7 | Category 3 |
8 | Category 2 |
10 | Category 1 |
Examples
INNER JOIN of two tables
A dataset is based on the Employees
and Departments
tables joined with INNER JOIN
As a result of the Join, the dataset only contains rows that are common for the two tables. Let's create a chart containing fields from the two tables.
Now let's create a chart that will only contain fields from the Employees
table. In this case, DataLens does not use JOIN
and only works with this table. The chart displays all values from the Employees
table, not just those overlapping with the Departments
table.
LEFT/RIGHT JOIN of two tables
A dataset is based on the Bonuses
and Employees
tables joined with LEFT JOINBonuses
table is used in full, while only those values which are present in the Bonuses
table are selected from the Employees
table.
Let's create a chart containing fields from the two tables.
Next, let's only leave the Employees
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the Bonuses
table).
Optimization for RIGHT JOIN(https://en.wikipedia.org/wiki/Join_(SQL)#Right_outer_join) works the same way.
INNER JOIN of three tables
A dataset is based on three tables:
- The first table (
Departments
) is joined with the second table (Employees
) usingINNER JOIN
. - The second table (
Employees
) is joined with the third table (Bonuses
) usingINNER JOIN
.
Let's create a chart containing fields from all the tables.
Next, let's only leave the Employees
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the Bonuses
and Departments
tables).
Let's add to the chart only the fields from the first (Departments
) and third (Bonuses
) tables. These tables are not linked explicitly. However, each of them is linked to the Employees
table. Therefore, DataLens does not optimize a query to the source. In this case, the chart will include values from all the three tables based on filtering results.
INNER JOIN of one table with the other two
A dataset is based on three tables:
- The first table (
Employees
) is joined with the second table (Departments
) usingINNER JOIN
. - The second table (
Employees
) is joined with the third table (Bonuses
) usingINNER JOIN
.
Let's create a chart containing fields from all the tables.
Next, let's only leave the Employees
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the Departments
and Bonuses
tables).
Let's add to the chart only the fields from the first (Employees
) and third (Bonuses
) tables. In this case, the chart will feature values that are common for these tables (without filtering data by the Departments
table).
INNER JOIN of four tables
A dataset is based on four tables:
- The first table (
Qualification
) is joined with the second table (Bonuses
) usingINNER JOIN
. - The first table (
Qualification
) is joined with the third table (Employees
) usingINNER JOIN
. - The third table (
Employees
) is joined with the fourth table (Departments
) usingINNER JOIN
.
Let's create a chart containing fields from all the tables.
Next, let's only leave the Qualification
table fields in the chart. In this case, the chart will include all values from this table (without filtering data by the other tables).
If we only use a pair of joined tables (the first and the second one, the first and the third, or the third and the fourth) in the chart, it will show common values from these tables (without filtering data by other tables). For example, let's add to the chart only the fields from the first (Qualification
) and second (Bonuses
) tables.