Connecting to Yandex Managed Service for ClickHouse®
With a directed acyclic graph (DAG), you can configure a connection to a database in a Yandex Managed Service for ClickHouse® cluster. Data for connecting to the DB is stored in Yandex Lockbox and automatically inserted into the graph.
Getting started
-
Create a Managed Service for ClickHouse® cluster with the following parameters:
- DB name:
default-bd - Username:
admin - Password:
admin-password
Warning
You cannot create a database named
default. - DB name:
-
Create a Yandex Object Storage bucket to store the DAG file in.
-
Configure the Managed Service for Apache Airflow™ cluster:
- Enable the Use Lockbox Secret Backend option allowing you to use secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters.
- Under Dependencies, add the
airflow-clickhouse-pluginpip package. - Under DAG file storage, select the Object Storage bucket you created earlier. Your DAG file will be fetched from it.
-
Issue the
lockbox.payloadViewerrole to your service account.There is no need to assign the
lockbox.payloadViewerrole for the whole folder. It is enough to assign it for a specific Yandex Lockbox secret once you create it.
Create a Yandex Lockbox secret
For the Apache Airflow™ cluster to work correctly, your Yandex Lockbox secret's name must have this format: airflow/<artifact_type>/<artifact_ID>, where:
<artifact_type>: Decides what data will be stored in the secret. The allowed values are:connections: Connections.variables: Variables.config: Configuration data.
<artifact_ID>: ID to use to access the artifact in Apache Airflow™.
Create a Yandex Lockbox secret with the following parameters:
-
Name:
airflow/connections/ch. -
Secret type:
Custom. -
Key:
conn. -
Value: Select Text and specify the following contents:
{ "conn_type": "clickhouse", "host": "<ClickHouse®_cluster_host_FQDN>", "port": 9440, "schema": "default-bd", "login": "admin", "password": "admin-password", "extra": { "secure": "True" } }
For more information on how to get the FQDN of a ClickHouse® cluster host, see FQDNs of ClickHouse® hosts.
The secret will store the data to connect to the database in the Managed Service for ClickHouse® cluster.
Prepare the DAG file and run the graph
-
Create a local file named
clickhouse.pyand paste the following script to it:from airflow.decorators import dag, task from airflow_clickhouse_plugin.hooks.clickhouse import ClickHouseHook @dag(schedule=None) def clickhouse(): @task def query_clickhouse(): ch_hook = ClickHouseHook(clickhouse_conn_id="ch") result = ch_hook.execute('select 1;') print(f'query result: {result}') query_clickhouse() clickhouse() -
Upload the
clickhouse.pyDAG file to the bucket you created earlier. -
Make sure there is a new graph named
clickhousein the DAGs section.It may take a few minutes to load a DAG file from the bucket.
-
To run the graph, click
in the line with its name.
Check the result
To check the result in the Apache Airflow™ web interface:
- In the DAGs section, open the
clickhousegraph. - Go to the Graph section.
- Select query_clickhouse.
- Go to Logs.
- Make sure the logs contain the
query result: [(1,)]line. This means the query was successful.
- In the DAGs section, click
clickhouse. - Go to Tasks.
- Select query_clickhouse.
- Go to Tasks Instances.
- Select the task instance.
- The Logs section will open.
- Make sure the logs contain the
query result: [(1,)]line. This means the query was successful.