Connecting external dictionaries in Managed Service for ClickHouse®
You can add external dictionaries to your cluster and remove them. Read more about dictionaries in the ClickHouse® documentation
Managed Service for ClickHouse® supports several types of dictionary sources:
- ClickHouse®
- HTTP(s)
- MongoDB
- MySQL®
- PostgreSQL
You can manage dictionaries either via SQL or via cloud interfaces. We recommend SQL.
Warning
Changing dictionary settings will restart ClickHouse® servers on the cluster hosts.
Getting a list of dictionaries
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Dictionaries tab.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get a list of external dictionaries in a ClickHouse® cluster:
-
View a description of the CLI command for getting detailed cluster information:
yc managed-clickhouse cluster get --help
-
Run this command:
yc managed-clickhouse cluster get <cluster_name>
The added dictionaries are displayed in the dictionaries:
section of the command output.
To get a list of dictionaries, use the get REST API method for the Cluster resource or the ClusterService/Get gRPC API call.
Creating a dictionary
Warning
If the dictionary was created in the console, it cannot be managed via SQL.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Dictionaries tab.
- In the top-right corner, click Create dictionary.
- Specify dictionary settings and click Save.
Warning
If the dictionary is added via the CLI, it cannot be managed via SQL.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create an external dictionary in a ClickHouse® cluster:
-
View a description of the CLI command for adding dictionaries:
yc managed-clickhouse cluster add-external-dictionary --help
-
Run the add dictionary command and specify dictionary settings:
yc managed-clickhouse cluster add-external-dictionary \ --name=<ClickHouse®>_cluster_name \ --dict-name=<dictionary_name> \ ...
Warning
If the dictionary is added via the API, it cannot be managed via SQL.
To create a dictionary, use the createExternalDictionary REST API method for the Cluster resource or the ClusterService/CreateExternalDictionary gRPC API call.
Warning
If the dictionary is added via SQL, management using the console, the CLI, and the API is not available for it.
-
Connect to the required database of the Managed Service for ClickHouse® cluster using
clickhouse-client
. -
Run the DDL query
and specify dictionary settings:CREATE DICTIONARY <dictionary_name>( <data_columns> ) PRIMARY KEY <name_of_column_with_keys> SOURCE(<source>(<source_configuration>)) LIFETIME(<update_interval>) LAYOUT(<memory_storage_method>());
Deleting a dictionary
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Dictionaries tab.
- Click
next to the dictionary you want to delete and select Delete dictionary.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To remove an external dictionary:
-
View a description of the CLI command for removing a dictionary:
yc managed-clickhouse cluster remove-external-dictionary --help
-
Run the command to remove a dictionary:
yc managed-clickhouse cluster remove-external-dictionary \ --name=<cluster_name> \ --dict-name=<dictionary_name>
To delete a dictionary, use the deleteExternalDictionary REST API method for the Cluster resource or the ClusterService/DeleteExternalDictionary gRPC API call.
Dictionary settings
SQL
<dictionary_name>
: Name of the new dictionary.<data_columns>
: List of columns with dictionary entries and their type.PRIMARY KEY
: Dictionary key column name.SOURCE
: Source and its parameters.LIFETIME
: Dictionary update frequency.LAYOUT
: Memory layout type for the dictionary. Supported methods:flat
,hashed
,cache
,range_hashed
,complex_key_hashed
, andcomplex_key_cache
.
For more information about the settings, see the ClickHouse® documentation
Management console
- Name: Name of a new dictionary.
Source
- Host: Source host name. The host must be in the same network as the ClickHouse® cluster.
- Port: Port for connecting to the source.
- User: Name of source database user.
- Password: Password to access the source database.
- Database: Source database name.
- Table: Source table name.
- Selection condition: Condition for selecting rows to generate a dictionary from. For example, the
id=10
selection condition is the same as theWHERE id=10
SQL command. - (Optional) Check dictionary status: SQL query to check for changes in a dictionary. ClickHouse® will update the dictionary only if the result of this query changes.
For more information, see the ClickHouse® documentation .
- URL: HTTP(s) source URL.
- File format: File format
for the HTTP(s) source. Read more about formats in the ClickHouse® documentation .
- Host: Source host name. The host must be in the same network as the ClickHouse® cluster.
- Port: Port for connecting to the source.
- User: Name of source database user.
- Password: Password to access the source database.
- Database: Source database name.
- Collection: Name of the collection for the source.
- Replicas: List of MySQL® replicas to use as the dictionary source.
For replicas, you can set general connection settings or set up a port, username and password. - Port: Port for connecting to the source.
- User: Name of source database user.
- Password: Password to access the source database.
- Database: Source database name.
- Table: Source table name.
- Selection condition: Condition for selecting rows to generate a dictionary from. For example, the
id=10
selection condition is the same as theWHERE id=10
SQL command. - (Optional) Check dictionary status: SQL query to check for changes in a dictionary. ClickHouse® will update the dictionary only if the result of this query changes.
For more information, see the ClickHouse® documentation .
- Hosts: Names of a PostgreSQL host and its replicas that will be used as dictionary sources. The hosts must be in the same network as the ClickHouse® cluster.
- Port: Port for connecting to the source.
- User: Name of source database user.
- Password: Password to access the source database.
- Database: Source database name.
- Table: Source table name.
- (Optional) Check dictionary status: SQL query to check for changes in a dictionary. ClickHouse® will update the dictionary only if the result of this query changes.
For more information, see the ClickHouse® documentation . - SSL mode: Mode for establishing a secure SSL TCP/IP connection to the PostgreSQL database.
For more information, see the PostgreSQL documentation .
For more information about dictionary sources and their connection parameters, see the ClickHouse® documentation
Layout
-
Memory storage: Memory layout type for the dictionary. Supported methods:
flat
,hashed
,cache
,range_hashed
,complex_key_hashed
, andcomplex_key_cache
. For more information about how to store dictionaries in memory, see the ClickHouse® documentation . -
Cache size: Number of cache cells for the
cache
andcomplex_key_cache
methods. For more information about the cache, see the ClickHouse® documentation . -
Numeric key: Dictionary key column name. The key column must be the UInt64 data type. It is used for the
flat
,hashed
,cache
, andrange_hashed
methods. For more information about keys, see the ClickHouse® documentation . -
Data column: List of columns with dictionary data:
- Name: Column name.
- Data type: Column data type.
- (Optional) Default value: Default value for an empty element. When loading a dictionary, all empty elements are replaced with this value. You cannot put
NULL
in this field. - (Optional) Expression Expression
ClickHouse® applies to the column value. - Hierarchical: Hierarchical support flag.
- Injective: Injective
id
→attribute
mapping flag.
For more information about column parameters, see the ClickHouse® documentation
Update rate
-
Period: Set how often the dictionary updates:
-
fixed: Fixed period between dictionary updates:
- Period duration: Update interval for dictionary data in seconds.
-
variable: Time range for ClickHouse® to randomly select the time for update. This is necessary for distributing the load on the dictionary source when upgrading on a large number of servers:
- Minimum: Minimum interval between dictionary updates in seconds.
- Maximum: Maximum interval between dictionary updates in seconds.
-
For more information about updating dictionaries, see the ClickHouse® documentation
CLI
--dict-name
: Name of a new dictionary.
-
--clickhouse-source
: ClickHouse® source settings:db
: Source database name.table
: Source table name.host
: Source host name. The host must be in the same network as the ClickHouse® cluster.port
: Port for connecting to the source.user
: Name of source database user.password
: Password to access the source database.where
: Condition for selecting rows to generate a dictionary from. For example, theid=10
selection condition is the same as theWHERE id=10
SQL command.
--http-source-url
: HTTP(s) source URL.--http-source-format
: File format for the HTTP(s) source. Read more about formats in the ClickHouse® documentation .
-
--mongodb-source
: MongoDB source settings:db
: Source database name.connection
: Name of the collection for the MongoDB source.host
: Source host name. The host must be in the same network as the ClickHouse® cluster.port
: Port for connecting to the source.user
: Name of source database user.password
: Password to access the source database.
-
--mysql-source
: MySQL® source settings:db
: Source database name.table
: Source table name.port
: Port for connecting to the source.user
: Name of source database user.password
: Password to access the source database.where
: Condition for selecting rows to generate a dictionary from. For example, theid=10
selection condition is the same as theWHERE id=10
SQL command.
-
--mysql-replica
: Settings of MySQL® source replicas:host
: Replica host name.priority
: Replica priority. During a connection attempt, ClickHouse® reads from replicas based on their priority. The lower the number, the higher the priority.port
: Port for connecting to the replica.user
: Name of the database user.password
: Password to access the source database.
-
--mysql-invalidate-query
: SQL query to check for changes in a dictionary. ClickHouse® will update the dictionary only if the result of this query changes.
-
--postgresql-source
: PostgreSQL source settings:db
: Source database name.table
: Source table name.port
: Port for connecting to the source.user
: Name of source database user.password
: Password to access the source database.ssl-mode
: Mode for establishing a secure SSL TCP/IP connection to the PostgreSQL database. Acceptable values:disable
,allow
,prefer
,verify-ca
, andverify-full
.
-
--postgresql-source-hosts
: Names of a PostgreSQL host and its replicas that will be used as dictionary sources. The hosts must be in the same network as the ClickHouse® cluster. -
--postgresql-invalidate-query
: SQL query to check for changes in a dictionary. ClickHouse® will update the dictionary only if the result of this query changes.
-
--structure-id
: Dictionary key column name. The key column must be the UInt64 data type. It is used for theflat
,hashed
,cache
, andrange_hashed
methods. For more information about keys, see the ClickHouse® documentation . -
--structure-key
: List of columns with dictionary data:name
: Column name.type
: Column data type.null-value
: Default value for an empty element. When loading a dictionary, all empty elements are replaced with this value. You cannot putNULL
in this field.expression
: Expression ClickHouse® applies to the column value.hierarchical
: Hierarchical support flag.injective
: Injectiveid
→attribute
mapping flag.
-
--structure-attribute
: Description of the fields available for database queries:name
: Column name.type
: Column data type.null-value
: Default value for an empty element. When loading a dictionary, all empty elements are replaced with this value. You cannot putNULL
in this field.expression
: Expression ClickHouse® applies to the column value.hierarchical
: Hierarchical support flag.injective
: Injectiveid
→attribute
mapping flag.
-
--fixed-lifetime
: Fixed interval between dictionary updates in seconds. -
--lifetime-range
: Time range for ClickHouse® to randomly select the time for update. This is necessary for distributing the load on the dictionary source when upgrading on a large number of servers.min
: Minimum interval between dictionary updates in seconds.max
: Maximum interval between dictionary updates in seconds.
-
--layout-type
: Memory layout type for the dictionary. Supported methods:flat
,hashed
,cache
,range_hashed
,complex_key_hashed
, andcomplex_key_cache
. For more information about how to store dictionaries in memory, see the ClickHouse® documentation . -
--layout-size-in-cells
: Number of cache cells for thecache
andcomplex_key_cache
methods. For more information about the cache, see the ClickHouse® documentation .
API
For more information about the settings, see the description of the createExternalDictionary API method.
Examples
Add a dictionary with the following test characteristics:
-
Cluster:
mych
. -
Name:
mychdict
. -
Key column name:
id
. -
Fields available for database queries:
id
withUInt64
type.field1
withString
type.
-
Fixed period between dictionary updates: 300 seconds.
-
Memory layout type for the dictionary:
cache
. -
PostgreSQL source:
db1
database.- Table name:
table
. - Port for connection:
5432
. - Database user name:
user1
. - Password for access to the source database:
user1user1
. - Mode of secure SSL TCP/IP connection to the database:
verify-full
.
-
Host name:
rc1b-05vjbfhf********.mdb.yandexcloud.net
.
Run the following command:
yc managed-clickhouse cluster add-external-dictionary \
--name=mych \
--dict-name=mychdict \
--structure-id=id \
--structure-attribute name=id,`
`type=UInt64,`
`name=field1,`
`type=String \
--fixed-lifetime=300 \
--layout-type=cache \
--postgresql-source db=db1,`
`table=table,`
`port=5432,`
`user=user1,`
`password=user1user1,`
`ssl-mode=verify-full \
--postgresql-source-hosts=rc1b-05vjbfhf********.mdb.yandexcloud.net
ClickHouse® is a registered trademark of ClickHouse, Inc