Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex DataLens
  • DataLens neuroanalyst
    • Overview
    • Versioning
    • Caching
    • Combining data
    • Adding connection info
      • Creating a ClickHouse® connection
      • Creating a PostgreSQL connection
      • Creating a MySQL® connection
      • Creating a YDB connection
        • Connection
        • Features of JOIN
        • Recommendations
      • Creating a Greenplum® connection
      • Creating a SQL Server connection
      • Creating an Oracle Database connection
      • Creating a StarRocks connection
      • Creating a Prometheus connection
      • Creating a Snowflake connection
      • Creating a Trino connection
  • Gallery in DataLens
  • Audit Trails events

In this article:

  • Distributing datasets across connections
  • Creating quick dashboards
  • Caching parameters and the number of charts on the dashboard page
  • CHYT recommendations
  • Useful links
  1. Connections
  2. Databases
  3. Creating a YTsaurus CHYT connection
  4. Recommendations

Tips for creating a CHYT connection in Yandex DataLens

Written by
Yandex Cloud
Updated at June 2, 2026
  • Distributing datasets across connections
  • Creating quick dashboards
  • Caching parameters and the number of charts on the dashboard page
  • CHYT recommendations
  • Useful links

Distributing datasets across connectionsDistributing datasets across connections

  • The CHYT connection includes the clique data and the token of the user running the query. For teams and departments, create a connection for each clique in the cluster using a robot's token.
  • If a connection uses a user token, limit the permissions for this connection by the Execution access. Thus, other users will be able to view charts created on top of this connection, but will not be able to create new datasets.

Creating quick dashboardsCreating quick dashboards

  • Try to pre-aggregate data as much as possible. For example, if data is in milliseconds and you need to build analytics by days, pre-aggregate the data to days.
  • Cut down your on-the-fly calculations. For example, you can calculate the date by the datetime field, but filtering by this field will be slower.
  • Try to avoid the JOIN operator as it slows down queries.
  • Try to process data on an SSD, if these CHYT recommendations do not work. On SSDs, tables are processed faster.

Caching parameters and the number of charts on the dashboard pageCaching parameters and the number of charts on the dashboard page

  • Once you open the dashboard, individual queries are sent across all selectors and charts. Multiple queries are sent one by one. The current page and one more page (web page) are sent. All charts are not loaded at the end of the dashboard. So, to accelerate the loading, optimize the clique's data and resources.
  • We do not recommend creating multiple cliques.
  • The default cache lifetime is five minutes. If data is updated less often, e.g., once a day, extend the lifetime to one hour.

CHYT recommendationsCHYT recommendations

Typically, query processing rate decreases when reading (not only when reading directly from the disk, but also when decompressing, converting data from YTsaurus to CH format, and so on). If this is the case, increasing the number of instances does not always help, so we recommend changing the data storage format:

  • To avoid reading excessive columns, make sure the tables have the optimize_for=scan attribute.
  • If you use a set of filters to read data, make sure to sort your tables. If you use a sort key to filter your data, redundant chunks will be filtered before reading data from them.

Note

In some cases, when you sort a table, filtering is not efficient. The column has the string type and the DateTime <-> String conversion is not monotonous and has no one-to-one correspondence. The Int <-> DateTime conversion is monotonous, but generally, you cannot use such an optimization with a string representation. For example, 2020-01-01 00:00:00 and 2020-01-01T00:00:00 correctly represent the same time in ClickHouse®, but when sorting based on a string representation, the 2020-01-01 00:00:01 value may appear between them, hence, the String -> DateTime conversion is not monotonous, and this optimization cannot be used.

  • Do not use erasure coding. This type of code is designed for "cold" data, as it increases the replication workload, so reading such tables is not efficient. Queries against a table without erasure_codec run much faster than queries against the same table with erasure_codec.

Warning

However, replacing the erasure_codec attribute with optimize_for will not change the data format. To force the change, run merge with the force_transform=%true option.

  • Try to avoid dynamic tables. Reads from a dynamic table require reading the versions and the entire key. In this case, readers are less parallelized, which increases the cost of conversion to the ClickHouse® format, and you need to merge rows with the same keys. As a result, reading from dynamic tables is much slower. The exact figure depends on the table and its access pattern, but the difference can be as high as tenfold.
  • Avoid unreasonably heavy queries, e.g., sorting 100 GB of SELECT DISTINCT or GROUP BY data by a column having millions of distinct values or using a JOIN on big tables. Such queries will be processed more slowly.
  • Accelerate reads by increasing replication_factor on the table.

When building a dataset on a range of tables, DataLens uses the concatYtTablesRange CHYT function to output the most generic schema of all the tables. The final set includes the columns that are present in all the tables and have the same type in each of them. To output the data on all the columns of all the tables from the range, add the missing columns to older tables using alter-table.

Useful linksUseful links

  • CHYT performance
  • Visualizing data from CHYT
  • Managing access to DataLens

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Features of JOIN
Next
Creating a Greenplum® connection
© 2026 Direct Cursus Technology L.L.C.