Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex DataLens
    • Overview
    • Caching
    • Combining data
      • Creating a ClickHouse® connection
      • Creating a PostgreSQL connection
      • Creating a MySQL® connection
      • Creating a YDB connection
        • Connection
        • Specifics of JOIN
        • Recommendations
      • Creating a Greenplum® connection
      • Creating a SQL Server connection
      • Creating an Oracle Database connection
      • Creating a Prometheus connection
      • Creating a Snowflake connection
  • Audit Trails events

In this article:

  • Distributing datasets across connections
  • Creating quick dashboards
  • Caching parameters and the number of charts on the dashboard page
  • Recommendations from CHYT
  • Useful links
  1. Connections
  2. Databases
  3. Creating a YTsaurus CHYT connection
  4. Recommendations

Recommendations for creating a CHYT connection

Written by
Yandex Cloud
Updated at December 4, 2024
  • Distributing datasets across connections
  • Creating quick dashboards
  • Caching parameters and the number of charts on the dashboard page
  • Recommendations from CHYT
  • Useful links

Distributing datasets across connectionsDistributing datasets across connections

  • The CHYT connection includes the data about the clique and the token of the user that runs the query. For teams and departments, create a connection for each clique in the cluster using a robot's token.
  • If a connection uses a private token, limit the rights for this connection by the Execute access. Then other users will be able to view the charts created on top of this connection, but will not be able to create new datasets.

Creating quick dashboardsCreating quick dashboards

  • Try to pre-aggregate the data as much as possible. For example, if the data is in milliseconds and you need to build analytics by days, pre-aggregate the data to days.
  • Cut down your on-the-fly calculations. For example, you can calculate the date by the datetime field, but filtering by this field will be slower.
  • Try to avoid JOIN. This operator slows down queries.
  • Move the data processing to an SSD disk, if CHYT recommendations did not work. Tables on an SSD are processed faster.

Caching parameters and the number of charts on the dashboard pageCaching parameters and the number of charts on the dashboard page

  • Once you open the dashboard, individual queries are sent across all selectors and charts. Multiple queries are sent one by one. The current page and one more page (web page) are sent. All charts are not loaded at the end of the dashboard. So, to accelerate the loading, optimize the clique's data and resources.
  • We do not recommend creating multiple cliques.
  • The cache lifetime is 5 minutes by default. If the data is updated less often, for example once a day, increase the lifetime to 1 hour.

Recommendations from CHYTRecommendations from CHYT

Typically, query processing speed decreases when reading (not just when reading directly from the disk, but also when decompressing, converting data from YTsaurus to CH format, and so on). If this is the case, increasing the number of instances does not always help, so we recommend changing the data storage format:

  • To avoid reading excessive columns, make sure the tables have the optimize_for=scan attribute.
  • If you use a set of filters to read the data, be sure to sort your tables. If you use a sort key to filter your data, redundant chunks will be filtered before reading the data from them.

Note

In some cases, when you sort a table, filtering is not efficient. The column has the string type and the DateTime <-> String conversion is not monotonous and has no one-to-one correspondence. The Int <-> DateTime conversion is monotonous, but generally, you cannot use such an optimization with a string representation. For example, 2020-01-01 00:00:00 and 2020-01-01T00:00:00 correctly represent the same time in ClickHouse®, but when sorting based on a string representation, the 2020-01-01 00:00:01 value may appear between them, hence, the String -> DateTime conversion is monotonous, and this optimization cannot be used.

  • Do not use erasure coding. This type of code is designed for "cold" data, it increases the replication workload, so reading such tables is not effective. Queries to a table without erasure_codec run much faster than queries the same tables with erasure_codec.

Warning

However, replacing the erasure_codec attribute with optimize_for will not change the data format. To force the change, run merge with the force_transform=%true option.

  • Try to avoid dynamic tables. Any reading from a dynamic table requires reading the versions and the entire key. In this case, the readers are less parallelized, which increases the cost of conversion to ClickHouse® format, and you need to merge the rows with the same keys. As a result, reading from dynamic tables is much slower. The exact figure depends on the table and the pattern used to access it, but the difference can be as high as tenfold.
  • Avoid unreasonably heavy queries. An example is sorting of 100 GB of SELECT DISTINCT or GROUP BY data by a column having millions of distinct values or using JOIN on big tables. Such queries are always slow.
  • Accelerate reads by increasing replication_factor on the table.

When building a dataset on a range of tables, DataLens uses the concatYtTablesRange CHYT function. The function outputs the most generic schema of all the tables. The resulting set includes the columns that are present in all the tables and have the same type in each of them. To output the data on all the columns of all the tables from the range, add the missing columns to older tables using alter-table.

Useful linksUseful links

  • CHYT performance
  • Visualizing data from CHYT
  • Managing access to DataLens

Was the article helpful?

Previous
Specifics of JOIN
Next
Creating a Greenplum® connection
© 2025 Direct Cursus Technology L.L.C.