Managing format schemas in Managed Service for ClickHouse®
In Managed Service for ClickHouse®, you can INSERT and SELECT data in different formats. Most of these formats are self-descriptive. This means that they already contain a format schema that describes valid data types, their order, and representation in this format. Thus, for example, you can insert data directly from a file.
Note
A format schema describes the format of data input or output, while a data schema describes the structure and layout of ClickHouse® databases and tables storing this data. These concepts are not interchangeable.
Cap'n Proto
You can add one or multiple format schemas to your Managed Service for ClickHouse® cluster and use them to input and output data in the relevant formats.
Warning
To use the format schemas you added, insert the data into Managed Service for ClickHouse® via the HTTP interface
For more information about data formats, see this ClickHouse® guide
You can find examples of using Cap'n Proto and Protobuf formats when inserting data into a cluster in this tutorial.
Getting a list of format schemas in a cluster
- In the management console
, select the folder the cluster is in. - Go to Managed Service for ClickHouse.
- Click the cluster name and select the Data format schemas tab.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To get a list of format schemas in a cluster, run this command:
yc managed-clickhouse format-schema list --cluster-name="<cluster_name>"
You can get the cluster name with the list of clusters in the folder.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the FormatSchema.List method, e.g., via the following cURL
request:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas'You can get the cluster ID with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Call the FormatSchemaService.List method, e.g., via the following gRPCurl
request:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.ListYou can get the cluster ID with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
Getting detailed information about a format schema
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To get detailed information about a format schema, run this command:
yc managed-clickhouse format-schema get "<format_schema_name>" \
--cluster-name="<cluster_name>"
You can get the schema name with the list of format schemas in the cluster, and the cluster name, with the list of clusters in the folder.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the FormatSchema.Get method, e.g., via the following cURL
request:curl \ --request GET \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>'You can get the cluster ID with the list of clusters in the folder, and the schema name, with the list of schemas in the cluster.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Call the FormatSchemaService.Get method, e.g., via the following gRPCurl
request:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.GetYou can get the cluster ID with the list of clusters in the folder, and the schema name, with the list of schemas in the cluster.
-
View the server response to make sure your request was successful.
ClickHouse® is a registered trademark of ClickHouse, Inc
Creating a format schema
Before adding a format schema
Managed Service for ClickHouse® only works with format schemas uploaded to Yandex Object Storage and accessible for reading. Before adding a schema to a cluster:
-
Prepare a file with a format schema (see the Cap'n Proto
and Protobuf tutorials). -
To attach a service account to a cluster, assign the iam.serviceAccounts.user role or higher to your Yandex Cloud account.
-
Upload the format schema file to Yandex Object Storage.
-
Attach the service account to the cluster. You will use this service account to configure permissions to access the schema file.
-
Assign the
storage.viewerrole to the service account. -
In the bucket ACL, add the
READpermission to the service account. -
Get a link to the schema file.
Add the format schema
- In the management console
, select the folder the cluster is in. - Go to Managed Service for ClickHouse.
- Click the cluster name and select the Data format schemas tab.
- Click Create schema.
- In the Add schema dialog box, fill out the form by specifying the schema file link generated earlier in the URL field.
- Click Create.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To create a format schema, run this command:
-
For Cap'n Proto:
yc managed-clickhouse format-schema create "<format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="capnproto" \ --uri="<link_to_file_in_Object_Storage>" -
For Protobuf:
yc managed-clickhouse format-schema create "<format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="protobuf" \ --uri="<link_to_file_in_Object_Storage>"
You can get the cluster name with the list of clusters in the folder.
-
Open the current Terraform configuration file describing your infrastructure.
For information on how to create such a file, see Creating a cluster.
-
Add the
format_schemasection to the Managed Service for ClickHouse® cluster description:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<link_to_format_schema_file_in_Object_Storage>" } }Where
typeis the schema type,FORMAT_SCHEMA_TYPE_CAPNPROTOorFORMAT_SCHEMA_TYPE_PROTOBUF. -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information, see this Terraform provider guide.
Timeouts
The Terraform provider sets the following timeouts for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring from a backup: 60 minutes.
- Updating a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the timeout are aborted.
How do I change these limits?
Add a timeouts section to the cluster description, e.g.:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the FormatSchema.Create method, e.g., via the following cURL
request:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://{{ api-host-mdb }/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas' \ --data '{ "formatSchemaName": "<schema_name>", "type": "<schema_type>", "uri": "<file_link>" }'Where:
formatSchemaName: Schema name.type: Schema type,FORMAT_SCHEMA_TYPE_CAPNPROTOorFORMAT_SCHEMA_TYPE_PROTOBUF.uri: Link to the schema file in Object Storage.
You can get the cluster ID with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Call the FormatSchemaService.Create method, e.g., via the following gRPCurl
request:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>", "type": "<schema_type>", "uri": "<file_link>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.CreateWhere:
format_schema_name: Schema name.type: Schema type,FORMAT_SCHEMA_TYPE_CAPNPROTOorFORMAT_SCHEMA_TYPE_PROTOBUF.uri: Link to the schema file in Object Storage.
You can get the cluster ID with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
Changing a format schema
Managed Service for ClickHouse® does not track changes in a format schema file located in a Yandex Object Storage bucket.
To update the contents of a schema that is already added to the cluster:
- Upload the file with the current format schema to Yandex Object Storage.
- Get a link to this file.
- Update the settings of the format schema added to Managed Service for ClickHouse® by providing a new link to the format schema file.
- In the management console
, select the folder the cluster is in. - Go to Managed Service for ClickHouse.
- Click the cluster name and select the Data format schemas tab.
- Select the appropriate schema, click
, and select Edit.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To update the link to a format schema file in an object storage, run this command:
yc managed-clickhouse format-schema update "<data_schema_name>" \
--cluster-name="<cluster_name>" \
--uri="<new_link_to_file_in_Object_Storage>"
You can get the schema name with the list of format schemas in the cluster, and the cluster name, with the list of clusters in the folder.
-
Open the current Terraform configuration file describing your infrastructure.
For information on how to create such a file, see Creating a cluster.
-
In the Managed Service for ClickHouse® cluster description, change the
uriparameter value underformat_schema:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<new_link_to_schema_file_in_Object_Storage>" } } -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
For more information, see this Terraform provider guide.
Timeouts
The Terraform provider sets the following timeouts for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring from a backup: 60 minutes.
- Updating a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the timeout are aborted.
How do I change these limits?
Add a
timeoutssection to the cluster description, e.g.:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... timeouts { create = "1h30m" # 1 hour 30 minutes update = "2h" # 2 hours delete = "30m" # 30 minutes } } -
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the FormatSchema.Update method, e.g., via the following cURL
request:Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the
updateMaskparameter as a single comma-separated string.curl \ --request PATCH \ --header "Authorization: Bearer $IAM_TOKEN" \ --header "Content-Type: application/json" \ --url 'https://{{ api-host-mdb }/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>' \ --data '{ "updateMask": "uri", "uri": "<file_link>" }'Where:
-
updateMask: Comma-separated list of settings you want to update.Here, we only specified a single setting,
uri. -
uri: Link to the new schema file in Object Storage.
You can get the cluster ID with the list of clusters in the folder.
-
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Call the FormatSchemaService.Update method, e.g., via the following gRPCurl
request:Warning
The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the
update_maskparameter as an array ofpaths[]strings.Format for listing settings
"update_mask": { "paths": [ "<setting_1>", "<setting_2>", ... "<setting_N>" ] }grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>", "update_mask": { "paths": ["uri"] }, "uri": "<file_link>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.UpdateWhere:
-
format_schema_name: Schema name. -
update_mask: List of settings you want to update as an array of strings (paths[]).Here, we only specified a single setting,
uri. -
uri: Link to the new model file in Object Storage.
You can get the cluster ID with the list of clusters in the folder.
-
-
View the server response to make sure your request was successful.
Removing a format schema
Note
After removing a format schema, the related object remains in the Yandex Object Storage bucket. If you no longer need this schema object, you can delete it.
- In the management console
, select the folder the cluster is in. - Go to Managed Service for ClickHouse.
- Click the cluster name and select the Data format schemas tab.
- Select the appropriate schema, click
, and select Delete.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To remove a format schema, run this command:
yc managed-clickhouse format-schema delete "<format_schema_name>" \
--cluster-name="<cluster_name>"
You can get the schema name with the list of format schemas in the cluster, and the cluster name, with the list of clusters in the folder.
-
Open the current Terraform configuration file describing your infrastructure.
For information on how to create such a file, see Creating a cluster.
-
Delete the section describing
format_schemain question from the Managed Service for ClickHouse® cluster description. -
Make sure the settings are correct.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information, see this Terraform provider guide.
Timeouts
The Terraform provider sets the following timeouts for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring from a backup: 60 minutes.
- Updating a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the timeout are aborted.
How do I change these limits?
Add a timeouts section to the cluster description, e.g.:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Call the FormatSchema.Delete method, e.g., via the following cURL
request:curl \ --request DELETE \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://mdb.api.cloud.yandex.net/managed-clickhouse/v1/clusters/<cluster_ID>/formatSchemas/<schema_name>'You can get the cluster ID with the list of clusters in the folder, and the schema name, with the list of schemas in the cluster.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and place it in an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Call the FormatSchemaService.Delete method, e.g., via the following gRPCurl
request:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/mdb/clickhouse/v1/format_schema_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "format_schema_name": "<schema_name>" }' \ mdb.api.cloud.yandex.net:443 \ yandex.cloud.mdb.clickhouse.v1.FormatSchemaService.DeleteYou can get the cluster ID with the list of clusters in the folder, and the schema name, with the list of schemas in the cluster.
-
View the server response to make sure your request was successful.