Yandex Cloud
Search
Discuss with expertTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex DataLens
  • DataLens neuroanalyst
    • Overview
    • Versioning
    • Caching
    • Combining data
    • Adding connection info
      • Creating a ClickHouse® connection
      • Creating a PostgreSQL connection
      • Creating a MySQL® connection
      • Creating a YDB connection
        • Connection
        • Features of JOIN
        • Recommendations
      • Creating a Greenplum® connection
      • Creating a SQL Server connection
      • Creating an Oracle Database connection
      • Creating a StarRocks connection
      • Creating a Prometheus connection
      • Creating a Snowflake connection
      • Creating a Trino connection
  • Gallery in DataLens
  • Audit Trails events

In this article:

  • Storing table data
  • Simple SELECT query in CHYT
  • Types of queries with a JOIN in ClickHouse®
  • Features of queries with a JOIN in CHYT
  1. Connections
  2. Databases
  3. Creating a YTsaurus CHYT connection
  4. Features of JOIN

JOIN in Yandex DataLens

Written by
Yandex Cloud
Updated at June 2, 2026
  • Storing table data
  • Simple SELECT query in CHYT
  • Types of queries with a JOIN in ClickHouse®
  • Features of queries with a JOIN in CHYT

You can join data via the dataset creation interface by dragging tables to the workspace and configuring links between them using the JOIN operator. For YTsaurus tables, a join like this causes an error due to the table storage structure and default query logic in YTsaurus. To avoid this error, use an SQL query to the source.

Storing table dataStoring table data

The table storage schema in YTsaurus fundamentally differs from that in ClickHouse®:

  • In YTsaurus, the basic data storage primitive is a static table with rows grouped into chunks that are randomly distributed across a cluster. By default, the data is distributed across shards based on a sharding key. Static tables are highly inefficient for point data reads.

    Note

    In ClickHouse®, sharding ensures locality of rows with a single sharding expression value per VM.

  • YTsaurus supports sorted tables: a table schema may factor in its row sorting by a column prefix. These columns are called key columns. This is a more efficient storage primitive: a dynamic table.

    Note

    In YTsaurus, sorting ensures locality of rows with a single key value per chunk (or in a set of successive chunks).

Simple SELECT query in CHYTSimple SELECT query in CHYT

A SELECT query without a JOIN is basically a streaming query. It reads data from a single source and performs row processing. You can apply it to function and aggregation fields.

In CHYT, this query is implemented as follows:

  1. A clique instance gets a query. The instance is called the query coordinator, as it determines the subsequent execution plan.
  2. The coordinator sends an rewritten query to each instance over the internal TCP. In this query, the input table name is replaced with an encoded sequence of chunk slice, where chunk slice is a certain range within a chunk. (The range is set using border row numbers or keys.)
  3. Each instance runs the rewritten query and returns its result to the coordinator for its final aggregation (if required) and for sending a response to the user.

Types of queries with a JOIN in ClickHouse®Types of queries with a JOIN in ClickHouse®

Same as a simple query, a SELECT query with a JOIN is sent to a clique instance called the query coordinator. The query coordinator determines the execution plan, while the load distribution between instances depends on how lhs JOIN rhs USING/ON is run:

  • In ClickHouse®, a distributed local JOIN is run by default. If tables are sharded in the same way, a pair of matching keys cannot end up on different VMs. Therefore, lhs and rhs are interpreted on each instance as corresponding local tables. In this case, the query the coordinator rewrites into fragments can be run independently on each instance.

    Note

    For this method, tables must have the same sharding schema.

  • A GLOBAL JOIN is executed if you use the GLOBAL keyword with JOIN. The query coordinator fully executes and materializes the right-hand side, rhs. Next, its serialized representation is sent along with the query to the instances. They use this representation to retrieve the right-hand side in their memory. For more information, see this guide.

    Note

    • This method is efficient if rhs is relatively small and there are relatively few instances. Otherwise, you may lack resources when running rhs on the coordinator or when the coordinator sends a serialized representation with a subquery across the network.
    • This method does not require meeting any additional conditions for table data storage or sharding consistency.
  • JOIN via subqueries: ClickHouse® allows enclosing lhs and/or rhs in brackets, which significantly affects the execution plan:

    Enclosing `lhs` in brackets
    Enclosing `rhs` in brackets
    1. The coordinator executes the LHS independently.
    2. The RHS is executed independently.
    3. The RHS is placed in-memory in a hash table.
    4. A JOIN is only fully executed on the coordinator.
    1. The coordinator executes the LHS independently as if the query is SELECT lhs.

    2. The coordinator sends its queries to instances leaving JOIN (rhs) as is.

    3. Each instance executes rhs independently.

      Note

      This may significantly increase the load, since each instance will materialize the RHS independently. Therefore, the ClickHouse® security algorithm prohibits this behavior by default returning the Double-distributed IN/JOIN subqueries is denied error.

Features of queries with a JOIN in CHYTFeatures of queries with a JOIN in CHYT

A SELECT query with a JOIN in CHYT runs as a Sorted JOIN by default. It uses the same column sorting schema. The standard lhs JOIN rhs USING/ON statement is used.

In this case, additional restrictions are imposed on lhs and rhs:

  • The lhs and rhs must be sorted tables.
  • The USING/ON clause must only use sorted columns.

For example:

  • Let's assume that lhs is sorted by the l1, l2, ..., ln columns, and rhs, by the r1, r2, ..., rm columns.
  • The JOIN ON clause should look like a set of l1 = r1 , ..., lk = rk equations for a certain k (the equations can be listed in any order).
  • This can be represented as a set of equations in the ON clause and as a set of general key columns in the USING statement, but not as a set of equations in the WHERE clause.

If these conditions are met, the query coordinator generates pairs of matching ranges from lhs and rhs and distributes them across the instances in subqueries.

Otherwise, an error is returned. In this case, you should either use GLOBAL JOIN or enclose rhs in a subquery.

ClickHouse® is a registered trademark of ClickHouse, Inc.

Was the article helpful?

Previous
Connection
Next
Recommendations
© 2026 Direct Cursus Technology L.L.C.