Cache invalidation in Yandex DataLens
DataLens caches query results from data sources to accelerate chart and dashboard rendering. By default, the cache refreshes only after its TTL expires. This may become a constraint if you need real-time visibility, as your data will update regularly but infrequently.
Here is how cache invalidation solves this issue: the system runs an occasional invalidation query against the data source to check if the data has changed. If the query returns a different result, the cache refreshes immediately without waiting for TTL expiration.
Invalidation stages
The invalidation query must be optimized to run fast on the target database.
The system runs the invalidation query at a configured interval (throttling), while the primary data query runs only when data actually changes or the cache TTL expires:
- The user configures the invalidation query in the dataset using either SQL or a formula with filters. You can only define one such query for a dataset, provided the dataset has been saved at least once.
- Upon each data query, the system checks the invalidation cache first.
- If the invalidation cache is stale, i.e., the throttling interval has elapsed, the system runs the invalidation query against the database.
- The query returns a short string, such as
"2024-01-15T10:30:00". - This result is appended to the key of the main data cache.
- When the query returns a different output or the cache TTL expires, the cache key changes, and fresh data is fetched.
Typical use cases
- Table with an
updated_atcolumn: Use theSELECT MAX(updated_at)::text FROM ordersinvalidation query to track the latest update timestamp. - Table with an auto-incrementing ID: Use the
SELECT MAX(id)::text FROM eventsquery to detect new records. - Versioned table: Use the
SELECT version::text FROM metadata WHERE table_name = 'orders'query to monitor the data version.
Throttling and coordination
Throttling defines the minimum interval (N seconds) between invalidation queries. For example, with a value of 60, the query runs at most once per minute, even if the data is queried 1 000 times within that window. Until this interval elapses, subsequent queries reuse the cached invalidation query result.
Invalidation modes
The following modes are available:
-
None: Disables cache invalidation. The cache refreshes only upon TTL expiration. This is the default behavior.
-
SQL: The user provides a custom SQL query to run directly against the data source.
Requirements:
- The connection must support subqueries.
- The SQL query must not be empty.
- The result must meet these criteria.
-
Formula: The user defines a DataLens formula (same as with calculated fields) and optional filters. The system compiles and executes the formula through the standard DataLens query pipeline.
Requirements:
- The formula cannot be empty.
- The formula must return a string value.
- The result must meet these criteria.
The
Formulamode is convenient because it leverages native DataLens tools (formulas and filters), so you do not need to know the SQL dialect of your specific database.
Requirements to the invalidation query result
The invalidation query (SQL or formula) must return a result that meets the following conditions:
- The query must return exactly one row and one column.
- The value must be a string.
- The string length cannot exceed
100characters.
Setting up cache validation
To configure cache validation in a dataset:
-
In your dataset's connection settings, go to Caching settings, enable Allow cache validation in datasets and set the cache refresh interval: max every
Nseconds. Acceptable values range from30to86 400seconds.Note
For the
SQLinvalidation mode, enable Raw SQL level. -
Open the dataset you want to configure cache validation for.
-
Go to the Cache validation tab.
-
Select an invalidation mode and configure validation settings:
NoneFormulaSQLNo invalidation. The cache refreshes only upon TTL expiration. This is the default behavior.
-
Next to the Formula field, click
Add and enter a DataLens formula following these requirements:- The formula cannot be empty.
- The formula must return a string value.
- The result must meet these criteria.
-
Optionally, at the bottom left, click Check result. You can check the result later.
-
Click Save.
-
Optionally, next to the Filtering field, click
Add and define the filtering conditions.
-
Next to the SQL query field, click
Add and enter the query text by following these requirements:- The connection must support subqueries.
- The SQL query must not be empty.
- The result must meet these criteria.
-
Optionally, at the bottom left, click Check result or skip this step to check the result later.
-
Click Save.
-
-
Click Check result (unavailable in None mode) and verify that the output meets the requirements.
-
In the top-right corner of the dataset edit window, click Save.
Invalidation testing
Invalidation query testing does not affect the main cache and ignores throttling. It is intended strictly for debugging to verify that the query runs correctly and returns the expected result.
This feature is only available to users with edit permissions for the dataset to prevent data leakage via RLS.
To run an invalidation test:
- Open the dataset you want to test.
- Go to the Cache validation tab.
- Click Check result and make sure the result meets these requirements.
Last invalidation result
Viewing the last validation result does not trigger new database queries. The available data includes the most recent query output and query timestamp.
This option is only available to users with edit permissions for the dataset.
The result may be either the returned string or an execution error. An error may also indicate an empty result, e.g., if the throttling interval has elapsed but the query has not run.
To view the last validation result:
- Open the dataset you want to test.
- Go to the Cache validation tab. Next to the Last result field, you can see the date and time of the most recent check. If the check failed, you will see
Failed with errornext to Last result. - Click Open to view the check result, or Show error to see the error message.
Error handling
The invalidation system follows the graceful degradation design principle: invalidation errors never block main data queries.
If invalidation is misconfigured, charts and dashboards will continue to render normally, and the cache will refresh based on its TTL as if invalidation is disabled. Error indicators will flag the affected charts and datasets, and full error details will be available in the Last result dataset field.
Limitations
- Invalidation query timeout is
20seconds. - The maximum result length is
100characters. - The result must contain exactly
1row. - The result must contain exactly
1column. - The result data type must be a string.