Managing data format schemas in Managed Service for ClickHouse®
Managed Service for ClickHouse® lets you INSERT
and SELECT
data in different formats. Most of those formats are self-descriptive. This means that they already contain a format schema that describes acceptable data types, their order, and representation in this format. For example, it lets you directly insert data from a file.
Note
Format schema describes the format of data input or output and the data schema describes the structure and layout of the ClickHouse® databases and tables that store this data. These concepts are not interchangeable.
The Cap'n Proto
You can add one or more such format schemas to a Managed Service for ClickHouse® cluster and use them to input and output data in the relevant formats.
Warning
To use the format schemas you added, insert the data into Managed Service for ClickHouse® using the HTTP interface
For more information about data formats, see the ClickHouse® documentation
You can find examples of working with the Cap'n Proto and Protobuf formats when inserting data into a cluster in the Adding data to a cluster section.
Before connecting the format schema
Managed Service for ClickHouse® only works with readable data format schemas imported to Yandex Object Storage. Before connecting the schema to a cluster:
-
Prepare a file with a format schema (see the documentation for Cap'n Proto
and Protobuf ). -
Import the file with the data format schema to Yandex Object Storage.
-
Configure access to the schema file using a service account:
- Connect a service account to a cluster.
- Assign the account the role of
storage.viewer
. - In the bucket ACL, grant the
READ
permission to the account.
-
Get a link to the schema file.
Creating a format schema
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
- Click Create schema.
- In the Add schema dialog box, fill out the form by completing the URL field with the previously generated link to the format schema file.
- Click Create.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To create a format schema, run this command:
-
For Cap'n Proto:
yc managed-clickhouse format-schema create "<format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="capnproto" \ --uri="<link_to_the_file_in_Object_Storage>"
-
For Protobuf:
yc managed-clickhouse format-schema create "<format_schema_name>" \ --cluster-name="<cluster_name>" \ --type="protobuf" \ --uri="<link_to_the_file_in_Object_Storage>"
You can request the cluster name with a list of clusters in the folder.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
Add the
format_schema
block to the Managed Service for ClickHouse® cluster description:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<link_to_data_format_schema_file_in_Object_Storage>" } }
Where
type
is the schema type:FORMAT_SCHEMA_TYPE_CAPNPROTO
orFORMAT_SCHEMA_TYPE_PROTOBUF
. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the timeouts
block to the cluster description, for example:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
To create a format schema, use the create REST API method for the FormatSchema resource or the FormatSchemaService/Create gRPC API call and provide the following in the request:
- Cluster ID in the
clusterId
parameter. You can get the cluster ID with a list of clusters in the folder. - Format schema name in the
formatSchemaName
parameter. - Schema type:
FORMAT_SCHEMA_TYPE_CAPNPROTO
orFORMAT_SCHEMA_TYPE_PROTOBUF
in thetype
parameter. - Link to the file in Yandex Object Storage in the
uri
parameter.
Changing a format schema
Managed Service for ClickHouse® does not track changes in the format schema file that is in the Yandex Object Storage bucket.
To update the contents of a schema that is already connected to the cluster:
- Upload the file with the current format schema to Yandex Object Storage.
- Get a link to this file.
- Change the parameters of the format schema that is connected to Managed Service for ClickHouse® by providing a new link to the format schema file.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
- Select the appropriate schema, click
, and select Edit.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To change the link to the file in object storage with the format schema, run the command:
yc managed-clickhouse format-schema update "<data_schema_name>" \
--cluster-name="<cluster_name>" \
--uri="<new_link_to_file_in_Object_Storage>"
You can request the schema name with a list of format schemas in the cluster and the cluster name with a list of clusters in the folder.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
In the Managed Service for ClickHouse® cluster description, change the parameter of the
uri
value in theformat_schema
block:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... format_schema { name = "<schema_name>" type = "<schema_type>" uri = "<new_link_to_schema_file_in_Object_Storage>" } }
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
For more information, see the Terraform provider documentation
.Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the
timeouts
block to the cluster description, for example:resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" { ... timeouts { create = "1h30m" # 1 hour 30 minutes update = "2h" # 2 hours delete = "30m" # 30 minutes } }
-
To update a data format schema, use the update REST API method for the FormatSchema resource or the FormatSchemaService/Update gRPC API call and provide the following in the request:
-
Cluster ID in the
clusterId
parameter. You can get the cluster ID with a list of clusters in the folder. -
Format schema name in the
formatSchemaName
parameter. You can request the schema name with a list of format schemas in the cluster. -
New link to the file in Yandex Object Storage in the
uri
parameter. -
List of cluster configuration fields to update in the
updateMask
parameter.Warning
This API method overrides all parameters of the object being modified that were not explicitly passed in the request to the default values. To avoid this, list the settings you want to change in the
updateMask
parameter (one line separated by commas).
Disabling a format schema
Note
After disabling a format schema, the corresponding object is kept in the Yandex Object Storage bucket. If this object with the format schema is no longer needed, you can delete.
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
- Select the appropriate schema, click
, and select Delete.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To disable a format schema, run the command:
yc managed-clickhouse format-schema delete "<format_schema_name>" \
--cluster-name="<cluster_name>"
You can request the schema name with a list of format schemas in the cluster and the cluster name with a list of clusters in the folder.
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about how to create this file, see Creating clusters.
-
Delete the
format_schema
block describing the required format schema from the Managed Service for ClickHouse® cluster description. -
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform provider documentation
Time limits
A Terraform provider sets the timeout for Managed Service for ClickHouse® cluster operations:
- Creating a cluster, including by restoring one from a backup: 60 minutes.
- Editing a cluster: 90 minutes.
- Deleting a cluster: 30 minutes.
Operations exceeding the set timeout are interrupted.
How do I change these limits?
Add the timeouts
block to the cluster description, for example:
resource "yandex_mdb_clickhouse_cluster" "<cluster_name>" {
...
timeouts {
create = "1h30m" # 1 hour 30 minutes
update = "2h" # 2 hours
delete = "30m" # 30 minutes
}
}
To delete a data format schema, use the delete REST API method for the FormatSchema resource or the FormatSchemaService/Delete gRPC API call and provide the following in the request:
- Cluster ID in the
clusterId
parameter. You can get the cluster ID with a list of clusters in the folder. - Format schema name in the
formatSchemaName
parameter. You can request the schema name with a list of format schemas in the cluster.
Getting a list of format schemas in a cluster
- In the management console
, go to the folder page and select Managed Service for ClickHouse. - Click the cluster name and open the Data format schemas tab.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get a list of format schemas in a cluster, run the command:
yc managed-clickhouse format-schema list --cluster-name="<cluster_name>"
You can request the cluster name with a list of clusters in the folder.
To get a list of data format schemas, use the list REST API method for the FormatSchema resource or the FormatSchemaService/List gRPC API call and provide the cluster ID in the clusterId
request parameter.
You can get the cluster ID with a list of clusters in the folder.
Getting detailed information about a format schema
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To get detailed information about a format schema, run the command:
yc managed-clickhouse format-schema get "<format_schema_name>" \
--cluster-name="<cluster_name>"
You can request the schema name with a list of format schemas in the cluster and the cluster name with a list of clusters in the folder.
To get detailed information about a data format schema, use the get REST API method for the FormatSchema resource or the FormatSchemaService/Get gRPC API call and provide the following in the request:
- Cluster ID in the
clusterId
parameter. You can get the cluster ID with a list of clusters in the folder. - Format schema name in the
formatSchemaName
parameter. You can request the schema name with a list of format schemas in the cluster.
ClickHouse® is a registered trademark of ClickHouse, Inc