Managing data format schemas in Managed Service for ClickHouse®
Managed Service for ClickHouse® lets you INSERT and SELECT data in different formats. Most of those formats are self-descriptive. This means that they already contain a format schema that describes acceptable data types, their order, and representation in this format. For example, it lets you directly insert data from a file.
Note
Format schema describes the format of data input or output and the data schema describes the structure and layout of the ClickHouse® databases and tables that store this data. These concepts are not interchangeable.
The Cap'n Proto
You can add one or more such format schemas to a Managed Service for ClickHouse® cluster and use them to input and output data in the relevant formats.
Warning
To use the format schemas you added, insert the data into Managed Service for ClickHouse® using the HTTP interface
For more information about data formats, see the ClickHouse® documentation
You can find examples of using Cap'n Proto and Protobuf formats when inserting data into a cluster in this tutorial.
Getting a list of data format schemas in a cluster
- In the management console
, navigate to the folder dashboard and select Managed Service for ClickHouse. - Click the name of your cluster and open the Data format schemas tab.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To get a list of data format schemas in a cluster, run this command:
yc managed-clickhouse format-schema list --cluster-name="<cluster_name>"
You can get the cluster name with the list of clusters in the folder.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the FormatSchema.List method and send the following request, e.g., via cURL
:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas'You can get the cluster ID from the folder’s cluster list.
-
Check the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the FormatSchemaService.List call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.ListYou can get the cluster ID from the folder’s cluster list.
-
View the server response to make sure your request was successful.
Getting detailed information about a data format schema
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To get detailed information about a data format schema, run this command:
yc managed-clickhouse format-schema get "<data_format_schema_name>" \
--cluster-name="<cluster_name>"
You can get the schema name with a list of data format schemas in the cluster, and the cluster name, with a list of clusters in the folder.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the FormatSchema.Get method and send the following request, e.g., via cURL
:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>'You can get the cluster ID with a list of clusters in the folder, and the schema name, with a list of schemas in the cluster.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the FormatSchemaService.Get call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.GetYou can get the cluster ID with a list of clusters in the folder, and the schema name, with a list of schemas in the cluster.
-
Check the server response to make sure your request was successful.
ClickHouse® is a registered trademark of ClickHouse, Inc
Creating a data format schema
Before adding a data format schema
Managed Service for ClickHouse® only works with data format schemas uploaded to Yandex Object Storage and accessible for reading. Before adding a schema to a cluster:
-
Prepare a file with a data format schema (see the Cap'n Proto
and Protobuf tutorials). -
To link a service account to a cluster, assign the iam.serviceAccounts.user role or higher to your Yandex Cloud account.
-
Upload the data format schema file to Yandex Object Storage.
-
Connect the service account to the cluster. You will use this service account to configure permissions to access the schema file.
-
Assign the
storage.viewerrole to the service account. -
In the bucket's ACL, add the
READpermission to the service account. -
Get a link to the schema file.
Add the data format schema
- In the management console
, navigate to the folder dashboard and select Managed Service for ClickHouse. - Click the name of your cluster and open the Data format schemas tab.
- Click Create schema.
- In the Add schema dialog box, fill out the form by specifying the schema file link generated earlier in the URL field.
- Click Create.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To create a data format schema, run this command:
-
For Cap'n Proto:
yc managed-clickhouse format-schema create "<data_format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="capnproto" \ --uri="<link_to_file_in_Object_Storage>" -
For Protobuf:
yc managed-clickhouse format-schema create "<data_format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="protobuf" \ --uri="<link_to_file_in_Object_Storage>"
You can get the cluster name with the list of clusters in the folder.
-
Open the current Terraform configuration file describing your infrastructure.
For more information about creating this file, see Creating clusters.
-
Add the
format_schemasection to the Managed Service for ClickHouse® cluster description:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<link_to_data_format_schema_file_in_Object_Storage>" } }Where
typeis the schema type,FORMAT_SCHEMA_TYPE_CAPNPROTOorFORMAT_SCHEMA_TYPE_PROTOBUF. -
Validate your configuration.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information, see this Terraform provider article.
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the timeouts block to the cluster description, for example:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the FormatSchema.Create method and send the following request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://{{ api-host-mdb }/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas' \ --data '{ "formatSchemaName": "<schema_name>", "type": "<schema_type>", "uri": "<file_link>" }'Where:
formatSchemaName: Schema name.type: Schema type,FORMAT_SCHEMA_TYPE_CAPNPROTOorFORMAT_SCHEMA_TYPE_PROTOBUF.uri: Link to the schema file in Object Storage.
You can get the cluster ID from the folder’s cluster list.
-
Check the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the FormatSchemaService.Create call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>", "type": "<schema_type>", "uri": "<file_link>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.CreateWhere:
format_schema_name: Schema name.type: Schema type,FORMAT_SCHEMA_TYPE_CAPNPROTOorFORMAT_SCHEMA_TYPE_PROTOBUF.uri: Link to the schema file in Object Storage.
You can get the cluster ID from the folder’s cluster list.
-
View the server response to make sure your request was successful.
Changing a data format schema
Managed Service for ClickHouse® does not track changes in a data format schema file located in a Yandex Object Storage bucket.
To update the contents of a schema that is already added to the cluster:
- Upload the file with the current data format schema to Yandex Object Storage.
- Get a link to this file.
- Update the settings of the data format schema added to Managed Service for ClickHouse® by providing a new link to the data format schema file.
- In the management console
, navigate to the folder dashboard and select Managed Service for ClickHouse. - Click the name of your cluster and open the Data format schemas tab.
- Select the appropriate schema, click
, and select Edit.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To update the link to a data format schema file in an object storage, run this command:
yc managed-clickhouse format-schema update "<data_schema_name>" \
--cluster-name="<cluster_name>" \
--uri="<new_link_to_file_in_Object_Storage>"
You can get the schema name with a list of data format schemas in the cluster, and the cluster name, with a list of clusters in the folder.
-
Open the current Terraform configuration file describing your infrastructure.
For more information about creating this file, see Creating clusters.
-
In the Managed Service for ClickHouse® cluster description, change the
urivalue underformat_schema:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<new_link_to_schema_file_in_Object_Storage>" } } -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
For more information, see this Terraform provider article.
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the
timeoutsblock to the cluster description, for example:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... timeouts { create = "1h30m" # 1 hour 30 minutes update = "2h" # 2 hours delete = "30m" # 30 minutes } } -
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the FormatSchema.Update method and send the following request, e.g., via cURL
:Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the
updateMaskparameter as a single comma-separated string.curl \ --request PATCH \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://{{ api-host-mdb }/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>' \ --data '{ "updateMask": "uri", "uri": "<file_link>" }'Where:
-
updateMask: Comma-separated list of settings you want to modify.Here, we only specified a single parameter,
uri. -
uri: Link to the new schema file in Object Storage.
You can get the cluster ID from the folder’s cluster list.
-
-
Check the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the FormatSchemaService.Update call and send the following request, e.g., via gRPCurl
:Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the
update_maskparameter as an array ofpaths[]strings.Format for listing settings
"update_mask": { "paths": [ "<setting_1>", "<setting_2>", ... "<setting_N>" ] }grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>", "update_mask": { "paths": ["uri"] }, "uri": "<file_link>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.UpdateWhere:
-
format_schema_name: Schema name. -
update_mask: List of settings you want to modify as an array of strings (paths[]).Here, we only specified a single parameter,
uri. -
uri: Link to the new model file in Object Storage.
You can get the cluster ID from the folder’s cluster list.
-
-
View the server response to make sure your request was successful.
Removing a data format schema
Note
After removing a data format schema, the related object remains in the Yandex Object Storage bucket. If you no longer need this schema object, you can delete it.
- In the management console
, navigate to the folder dashboard and select Managed Service for ClickHouse. - Click the name of your cluster and open the Data format schemas tab.
- Select the appropriate schema, click
, and select Delete.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To remove a data format schema, run this command:
yc managed-clickhouse format-schema delete "<data_format_schema_name>" \
--cluster-name="<cluster_name>"
You can get the schema name with a list of data format schemas in the cluster, and the cluster name, with a list of clusters in the folder.
-
Open the current Terraform configuration file describing your infrastructure.
For more information about creating this file, see Creating clusters.
-
Delete the section describing
format_schemain question from the Managed Service for ClickHouse® cluster description. -
Validate your configuration.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information, see this Terraform provider article.
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the timeouts block to the cluster description, for example:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the FormatSchema.Delete method and send the following request, e.g., via cURL
:curl \ --request DELETE \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>'You can get the cluster ID with a list of clusters in the folder, and the schema name, with a list of schemas in the cluster.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the FormatSchemaService.Delete call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.DeleteYou can get the cluster ID with a list of clusters in the folder, and the schema name, with a list of schemas in the cluster.
-
View the server response to make sure your request was successful.