Managing data format schemas in Managed Service for ClickHouse®
Managed Service for ClickHouse® lets you INSERT
and SELECT
data in different formats. Most of those formats are self-descriptive. This means that they already contain a format schema that describes acceptable data types, their order, and representation in this format. For example, it lets you directly insert data from a file.
Note
Format schema describes the format of data input or output and the data schema describes the structure and layout of the ClickHouse® databases and tables that store this data. These concepts are not interchangeable.
The Cap'n Proto
You can add one or more such format schemas to a Managed Service for ClickHouse® cluster and use them to input and output data in the relevant formats.
Warning
To use the format schemas you added, insert the data into Managed Service for ClickHouse® using the HTTP interface
For more information about data formats, see the ClickHouse® documentation
You can find examples of working with the Cap'n Proto and Protobuf formats when inserting data into a cluster in the Adding data to a cluster section.
Before connecting the format schema
Managed Service for ClickHouse® only works with readable data format schemas imported to Yandex Object Storage. Before connecting the schema to a cluster:
-
Prepare a file with a format schema (see the documentation for Cap'n Proto
and Protobuf ). -
To link your service account to the cluster, make sure your Yandex Cloud account has the iam.serviceAccounts.user role or higher.
-
Import the file with the data format schema to Yandex Object Storage.
-
Connect the service account to the cluster. You will use this service account to configure permissions to access the schema file.
-
Assign the
storage.viewer
role to the service account. -
In the bucket's ACL, add the
READ
permission to the service account. -
Get a link to the schema file.
Creating a format schema
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
- Click Create schema.
- In the Add schema dialog box, fill out the form by completing the URL field with the previously generated link to the format schema file.
- Click Create.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a format schema, run this command:
-
For Cap'n Proto:
yc managed-clickhouse format-schema create "<format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="capnproto" \ --uri="<link_to_file_in_Object_Storage>"
-
For Protobuf:
yc managed-clickhouse format-schema create "<format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="protobuf" \ --uri="<link_to_file_in_Object_Storage>"
You can request the cluster name with a list of clusters in the folder.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about creating this file, see Creating clusters.
-
Add a
format_schema
block to the Managed Service for ClickHouse® cluster description:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<link_to_data_format_schema_file_in_Object_Storage>" } }
Where
type
is the schema type,FORMAT_SCHEMA_TYPE_CAPNPROTO
orFORMAT_SCHEMA_TYPE_PROTOBUF
. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the timeouts
block to the cluster description, for example:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the FormatSchema.Create method and send the following request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://{{ api-host-mdb }/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas' \ --data '{ "formatSchemaName": "<schema_name>", "type": "<schema_type>", "uri": "<file_link>" }'
Where:
formatSchemaName
: Schema name.type
: Schema type,FORMAT_SCHEMA_TYPE_CAPNPROTO
orFORMAT_SCHEMA_TYPE_PROTOBUF
.uri
: Link to the file with the schema in Object Storage.
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the FormatSchemaService.Create call and and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>", "type": "<schema_type>", "uri": "<file_link>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.Create
Where:
format_schema_name
: Schema name.type
: Schema type,FORMAT_SCHEMA_TYPE_CAPNPROTO
orFORMAT_SCHEMA_TYPE_PROTOBUF
.uri
: Link to the file with the schema in Object Storage.
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
Changing a format schema
Managed Service for ClickHouse® does not track changes in the format schema file that is in the Yandex Object Storage bucket.
To update the contents of a schema that is already connected to the cluster:
- Upload the file with the current format schema to Yandex Object Storage.
- Get a link to this file.
- Change the parameters of the format schema that is connected to Managed Service for ClickHouse® by providing a new link to the format schema file.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
- Select the appropriate schema, click
, and select Edit.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To change the link to the file in object storage with the format schema, run the command:
yc managed-clickhouse format-schema update "<data_schema_name>" \
--cluster-name="<cluster_name>" \
--uri="<new_link_to_file_in_Object_Storage>"
You can request the schema name with a list of format schemas in the cluster and the cluster name with a list of clusters in the folder.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about creating this file, see Creating clusters.
-
In the Managed Service for ClickHouse® cluster description, change the
uri
parameter value underformat_schema
:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<new_link_to_schema_file_in_Object_Storage>" } }
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
For more information, see the Terraform provider documentation
.Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the
timeouts
block to the cluster description, for example:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... timeouts { create = "1h30m" # 1 hour 30 minutes update = "2h" # 2 hours delete = "30m" # 30 minutes } }
-
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the FormatSchema.Update method and send the following request, e.g., via cURL
:Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the
updateMask
parameter as a single comma-separated string.curl \ --request PATCH \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://{{ api-host-mdb }/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>' \ --data '{ "updateMask": "uri", "uri": "<file_link>" }'
Where:
-
updateMask
: List of parameters to update as a single string, separated by commas.Here only one parameter is specified:
uri
. -
uri
: Link to the new file with the schema in Object Storage.
You can get the cluster ID with a list of clusters in the folder.
-
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the FormatSchemaService.Update call and and send the following request, e.g., via gRPCurl
:Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the
update_mask
parameter as an array ofpaths[]
strings.Format for listing settings
"update_mask": { "paths": [ "<setting_1>", "<setting_2>", ... "<setting_N>" ] }
grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>", "update_mask": { "paths": ["uri"] }, "uri": "<file_link>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.Create
Where:
-
format_schema_name
: Schema name. -
update_mask
: List of parameters to update as an array ofpaths[]
strings.Here only one parameter is specified:
uri
. -
uri
: Link to the new model file in Object Storage.
You can get the cluster ID with a list of clusters in the folder.
-
-
View the server response to make sure the request was successful.
Disabling a format schema
Note
After disabling a format schema, the corresponding object is kept in the Yandex Object Storage bucket. If you no longer need this format schema object, you can delete it.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
- Select the appropriate schema, click
, and select Delete.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To disable a format schema, run the command:
yc managed-clickhouse format-schema delete "<format_schema_name>" \
--cluster-name="<cluster_name>"
You can request the schema name with a list of format schemas in the cluster and the cluster name with a list of clusters in the folder.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about creating this file, see Creating clusters.
-
Delete the
format_schema
description section for the appropriate data format schema from the Managed Service for ClickHouse® cluster description. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the timeouts
block to the cluster description, for example:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the FormatSchema.Delete method and send the following request, e.g., via cURL
:curl \ --request DELETE \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>'
You can request the cluster ID with a list of clusters in the folder and the schema name with a list of schemas in the cluster.
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the FormatSchemaService.Delete call and and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.Delete
You can request the cluster ID with a list of clusters in the folder and the schema name with a list of schemas in the cluster.
-
View the server response to make sure the request was successful.
Getting a list of format schemas in a cluster
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get a list of format schemas in a cluster, run the command:
yc managed-clickhouse format-schema list --cluster-name="<cluster_name>"
You can request the cluster name with a list of clusters in the folder.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the FormatSchema.List method and send the following request, e.g., via cURL
:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas'
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the FormatSchemaService.List call and and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.List
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
Getting detailed information about a format schema
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get detailed information about a format schema, run the command:
yc managed-clickhouse format-schema get "<format_schema_name>" \
--cluster-name="<cluster_name>"
You can request the schema name with a list of format schemas in the cluster and the cluster name with a list of clusters in the folder.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Use the FormatSchema.Get method and send the following request, e.g., via cURL
:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>'
You can request the cluster ID with a list of clusters in the folder and the schema name with a list of schemas in the cluster.
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Use the FormatSchemaService.Get call and and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.Get
You can request the cluster ID with a list of clusters in the folder and the schema name with a list of schemas in the cluster.
-
View the server response to make sure the request was successful.
ClickHouse® is a registered trademark of ClickHouse, Inc