Working with data in Yandex Managed Service for ClickHouse®
Yandex Query is an interactive service for serverless data analysis. It enables you to process information from different storages without the need to create a dedicated cluster. The service supports working with Yandex Object Storage, Yandex Managed Service for PostgreSQL, and Yandex Managed Service for ClickHouse® data storages.
In this tutorial, you will connect to the Managed Service for ClickHouse® database and run queries against it from the JupyterLab notebook using Query.
- Prepare your infrastructure.
- Get started in Query.
- Create a Managed Service for ClickHouse® cluster.
- Connect to the Managed Service for ClickHouse® data.
If you no longer need the resources you created, delete them.
Getting started
Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.
- On the DataSphere home page
, click Try for free and select an account to log in with: Yandex ID or your working account in the identity federation (SSO). - Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
- Create a community.
- Link your billing account to the DataSphere community you are going to work in. Make sure that you have a billing account linked and its status is
ACTIVE
orTRIAL_ACTIVE
. If you do not have a billing account yet, create one in the DataSphere interface.
Required paid resources
For working with Managed Service for ClickHouse® data, the cost of infrastructure support includes:
- Fee for DataSphere computing resource usage.
- Fee for a running Managed Service for ClickHouse® cluster.
- Fee for the amount of read data when executing Query queries.
Prepare the infrastructure
Log in to the Yandex Cloud management console
If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page
Note
If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.
Create a folder
- In the management console
, select a cloud and click Create folder. - Give your folder a name, e.g.,
data-folder
. - Click Create.
Create a service account for the DataSphere project
- Go to the
data-folder
folder. - In the Service accounts tab, click Create service account.
- Enter a name for the service account, e.g.,
yq-sa
. - Click Add role and assign the following roles to the service account:
datasphere.community-project.editor
: To run DataSphere computations.yq.editor
: To send Query queries.managed-clickhouse.viewer
: To view the contents of the Managed Service for ClickHouse® cluster.
- Click Create.
Add the service account to a project
To enable the service account to run a DataSphere project, add it to the list of project members.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - In the Members tab, click Add member.
- Select the
yq-sa
account and click Add.
Create an authorized key for a service account
To allow the service account to send Query queries, create an authorized key.
Note
Authorized keys do not expire, but you can always get new authorized keys and authenticate again if something goes wrong.
- In the management console
, go todata-folder
. - At the top of the screen, go to the Service accounts tab.
- Select the
yq-sa
service account. - Click Create new key in the top panel and select Create authorized key.
- Select the encryption algorithm and click Create.
- Click Download file with keys.
Create a secret
To get an authorized key from the notebook, create a secret with the contents of the authorized key file.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Under Project resources, click
Secret. - Click Create.
- In the Name field, enter the secret name:
yq_access_key
. - In the Value field, paste the full contents of the downloaded file with the authorized key.
- Click Create.
Create a notebook
Queries to the Managed Service for ClickHouse® database through Query will be sent from the notebook.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Click Open project in JupyterLab and wait for the loading to complete.
- In the top panel, click File and select New ⟶ Notebook.
- Select a kernel and click Select.
Getting started with Query
The yandex_query_magic
package provides magic commands for working in Jupyter. Install it to send queries to Query. Paste the code into the yq-storage.ipynb
notebook cells.
-
Open the DataSphere project:
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Click Open project in JupyterLab and wait for the loading to complete.
- Open the notebook tab.
-
-
Install the
yandex_query_magic
package:%pip install yandex_query_magic
-
Once the installation is complete, from the top panel, select Kernel ⟶ Restart kernel....
-
Upload the extension:
%load_ext yandex_query_magic
-
Configure the connection by specifying the
data-folder
ID and the name of the authorized key secret:%yq_settings --folder-id <folder_ID> --env-auth yq_access_key
-
Run a test query to Query:
%yq select "Hello, world!"
Create a Managed Service for ClickHouse® cluster
Any running Managed Service for ClickHouse® cluster with the Yandex Query access option enabled is suitable for sending queries.
-
In the management console
, selectdata-folder
. -
Select Managed Service for ClickHouse.
-
Click Create cluster.
-
In the Cluster name field, enter a name for the cluster, e.g.,
clickhouse
. -
Under DBMS settings:
- In the User management via SQL field, select Enabled from the drop-down list.
- Specify Username and Password.
-
Under Service settings:
- Select the
yq-sa
service account. - Enable the Yandex Query access and Access from the management console options.
- Select the
-
You may leave default values for the other settings.
-
Click Create cluster.
Create a table
In this step, you will create a test table with numbers from 0 to 100.
-
In the management console
, open theclickhouse
cluster page, and go to the SQL tab. -
Enter Username and Password you specified when creating the cluster.
-
In the input window on the right, paste an SQL query:
CREATE TABLE test(col1 int) ENGINE = MergeTree ORDER BY col1; INSERT INTO test SELECT * FROM numbers(100)
-
Click Execute.
Connect to data in Managed Service for ClickHouse®
To create a Query connection:
-
In the management console
, selectdata-folder
. -
In the list of services, select Yandex Query.
-
In the left-hand panel, select Connections.
-
Click
Create new. -
Enter a name for the connection, e.g.,
clickhouse
. -
Select the Managed Service for ClickHouse connection type.
-
Under Connection type parameters:
- Cluster: Select the previously created
clickhouse
cluster. - Service account: Select the
yq-sa
service account. - Enter Login and Password specified when you created the cluster.
- Cluster: Select the previously created
-
Click Create.
To check the connection, get the table data from the notebook cell:
%yq SELECT * FROM clickhouse.test
How to delete the resources you created
To stop paying for the resources you created:
ClickHouse® is a registered trademark of ClickHouse, Inc