Creating a Trino cluster
Each Managed Service for Trino cluster consists of a set of Trino components: a coordinator and workers – potentially several instances of these.
Roles for creating a cluster
To create a Managed Service for Trino cluster, your Yandex Cloud account needs the following roles:
- managed-trino.admin: To create a cluster.
- vpc.user to use the cluster network.
- iam.serviceAccounts.user to attach a service account to a cluster.
Make sure to assign the managed-trino.integrationProvider and storage.editor roles to the cluster's service account. The cluster will thus get the permissions it needs to work with user resources. For more information, see Impersonation.
For more information about assigning roles, see the Yandex Identity and Access Management documentation.
Creating a cluster
-
In the management console
, select the folder where you want to create a Managed Service for Trino cluster. -
Select Managed Service for Trino.
-
Click Create cluster.
-
Under Basic parameters:
-
Give the cluster a name. The name must be unique within the folder.
-
Optionally, enter a description for the cluster.
-
Optionally, create labels:
- Click Add label.
- Enter a label in
key: valueformat. - Press Enter.
-
Select an existing service account or create a new one.
Make sure to assign the
managed-trino.integrationProviderandstorage.editorroles to the service account. -
Select the Trino version.
Note
After you create a cluster, you can change your Trino version. You can either upgrade or downgrade the version.
-
-
Under Network settings, select a network, subnet, and security group for the cluster.
-
Under Retry policy, specify the fault-tolerant query execution parameters:
- Select an Object type for retry.
- Task: Retries the intermediate task within the query that caused worker failure.
- Query: Retries all stages of the query where worker failure occurred.
- Optionally, specify additional parameters in
key: valueformat in the Retry parameters field. For more information about parameters, see the Trino documentation . - Optionally, specify additional Exchange Manager storage parameters in
key: valueformat in the Storage parameters field. For more information about parameters, see the Trino documentation .
- Select an Object type for retry.
-
Configure the coordinator and workers.
-
Under Catalogs, add the required Trino catalogs. You can do this either when creating the cluster or later. For more information, see Creating a Trino catalog.
-
Under Advanced settings:
-
Optionally, enable cluster deletion protection.
-
Optionally, select cluster maintenance time:
- To enable maintenance at any time, select arbitrary (default).
- To specify the preferred maintenance start time, select by schedule and specify the desired day of the week and UTC hour. For example, you can choose a time when the cluster is least loaded.
Maintenance operations are carried out both on enabled and disabled clusters. They may include updating the DBMS, applying patches, and so on.
-
Optionally, configure logging:
- Enable the Write logs setting.
- Select the log destination:
- Folder: Select a folder from the list. Logs will be written to the selected folder's default log group.
- Group: Select a log group from the list or create a new one.
- Select Min. logging level from the list.
-
-
Click Create.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To create a Managed Service for Trino cluster:
-
Check whether the folder has any subnets for the cluster hosts:
yc vpc subnet listIf your folder has no subnets, create them in VPC.
-
View the description of the CLI command to create a cluster:
yc managed-trino cluster create --help -
Specify cluster parameters in that command (our example does not use all available parameters):
yc managed-trino cluster create \ --name <cluster_name> \ --version <version> \ --service-account-id <service_account_ID> \ --subnet-ids <list_of_subnet_IDs> \ --security-group-ids <list_of_security_group_IDs> \ --coordinator resource-preset-id=<class_of_computing_resources> \ --worker resource-preset-id=<class_of_computing_resources>,count=<number_of_workers> \ --deletion-protectionWhere:
-
--name: Cluster name. It must be unique within the folder. -
--version: Trino version.Note
After you create a cluster, you can change your Trino version. You can either upgrade or downgrade the version.
-
--service-account-id: Service account ID. -
--subnet-ids: List of subnet IDs. -
--security-group-ids: List of security group IDs. -
--coordinator: Coordinator configuration.resource-preset-id: Class of the coordinator's computing resources.
-
--worker: Worker configuration:resource-preset-id: Class of the worker's computing resources.count: Fixed number of workers.min_count: Minimum number of workers for automatic scaling.maxCount: Maximum number of workers for automatic scaling.
Specify either a fixed number of workers (
count), or minimum and maximum number of workers (minCount,maxCount) for automatic scaling. -
--deletion-protection: Cluster protection from accidental deletion,trueorfalse.Even if it is enabled, one can still connect to the cluster manually and delete it.
-
-
To enable sending of Trino logs to Yandex Cloud Logging, specify logging parameters:
yc managed-trino cluster create <cluster_name> \ ... --log-enabled \ --log-folder-id <folder_ID> \ --log-min-level <logging_level>Where:
-
--log-enabled: Enables logging. -
--log-folder-id: Folder ID. Logs will be written to the default log group for this folder. -
--log-group-id: Custom log group ID. Logs will be written to this group.You can specify only one of the parameters:
--log-folder-idor--log-group-id. -
--log-min-level: Minimum logging level. Possible values:TRACE,DEBUG,INFO(default),WARN,ERROR, andFATAL.
-
-
To enable a fault-tolerant query execution policy, specify these parameters:
yc managed-trino cluster create <cluster_name> \ ... --retry-policy-enabled \ --retry-policy \ --retry-policy-additional-properties <list_of_additional_retry_policy_parameters> \ --retry-policy-exchange-manager-service-s3 \ --retry-policy-exchange-manager-additional-properties <list_of_additional_storage_parameters>Where:
-
--retry-policy-enabled: Enables the retry policy. -
--retry-policy: Query retry method. The possible values are:task: Retries the intermediate task within the query that caused worker failure.query: Retries all stages of the query in which the worker failed.
-
--retry-policy-additional-properties: Additional query retry parameters in<key>=<value>format. For more information about parameters, see the Trino documentation . -
--retry-policy-exchange-manager-service-s3: Use an S3 storage to write data when retrying queries. -
--retry-policy-exchange-manager-additional-properties: Additional storage parameters in<key>=<value>format. For more information about parameters, see the Trino documentation .
-
-
To set up a maintenance window (including for disabled clusters), provide the required value in the
--maintenance-windowparameter:yc managed-trino cluster create <cluster_name> \ ... --maintenance-window type=<maintenance_type>,` `day=<day_of_week>,` `hour=<hour> \Where
typeis the maintenance type:anytime: At any time (default).weekly: On a schedule. For this value, also specify the following:day: Day of week, i.e.,MON,TUE,WED,THU,FRI,SAT, orSUN.hour: Hour of day (UTC), from1to24.
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
If you do not have Terraform yet, install it and configure the Yandex Cloud provider.
To create a Managed Service for Trino cluster:
-
In the configuration file, describe the resources you are creating:
-
Managed Service for Trino cluster: Cluster description.
-
Managed Service for Trino catalog: Catalog description.
-
Network: Description of the cloud network where a cluster will be located. If you already have a suitable network, you don't have to describe it again.
-
Subnets: Description of the subnets to connect the cluster hosts to. If you already have suitable subnets, you don't have to describe them again.
Here is an example of the configuration file structure:
resource "yandex_trino_cluster" "<cluster_name>" { name = "<cluster_name>" service_account_id = "<service_account_ID>" subnet_ids = [yandex_vpc_subnet.<subnet_name>.id] security_group_ids = [<list_of_security_group_IDs>] deletion_protection = <protect_cluster_from_deletion> version = "<version>" coordinator = { resource_preset_id = "<class_of_computing_resources>" } worker = { fixed_scale = { count = 4 } resource_preset_id = "<class_of_computing_resources>" } } resource "yandex_vpc_network" "<network_name>" { name = "<network_name>" } resource "yandex_vpc_subnet" "<subnet_name>" { name = "<subnet_name>" zone = "<availability_zone>" network_id = "yandex_vpc_network.<network_name>.id" v4_cidr_blocks = ["<range>"] }Where:
-
name: Cluster name. It must be unique within the folder. -
service_account_id: Service account ID. -
subnet_ids: List of subnet IDs. -
security_group_ids: List of security group IDs. -
deletion_protection: Cluster protection from accidental deletion,trueorfalse.Even if it is enabled, one can still connect to the cluster manually and delete it.
-
version: Trino version.Note
After you create a cluster, you can change your Trino version. You can either upgrade or downgrade the version.
-
coordinator: Coordinator configuration.resource_preset_id: Class of the coordinator's computing resources
-
worker: Worker configuration:-
resource_preset_id: Class of the worker's computing resources -
fixed_scale: Fixed worker scaling policy.count: Number of workers.
-
auto_scale: Worker autoscaling policy.min_count: Minimum number of workers.max_count: Maximum number of workers.
Specify either a fixed number of workers (
fixed_scale.count), or minimum and maximum number of workers (auto_scale.min_count,auto_scale.max_count) for autoscaling. -
-
-
To create Trino catalogs in the cluster, add the required number of
yandex_trino_catalogresources to the configuration file. You can do this either when creating the cluster or later. For more information, see Creating a Trino catalog. -
To enable sending Trino logs to Yandex Cloud Logging, add the
loggingsection to the cluster description:resource "yandex_trino_cluster" "<cluster_name>" { ... logging = { enabled = <enable_logging> folder_id = <folder_ID> min_level = "<logging_level>" } ... }Where:
-
enabled: Enables logging,trueorfalse. -
folder_id: Folder ID. Logs will be written to the default log group for this folder. -
log_group_id: Custom log group ID. Logs will be written to this group.You can specify only one of the parameters:
folder_idorlog_group_id. -
min_level: Minimum logging level. Possible values:TRACE,DEBUG,INFO(default),WARN,ERROR, andFATAL.
-
-
To enable a fault-tolerant query execution policy, add a
retry_policysection to the cluster description:resource "yandex_trino_cluster" "<cluster_name>" { ... retry_policy = { policy = "<object_type_for_retry>" additional_properties = { <list_of_additional_retry_policy_parameters> } exchange_manager = { additional_properties = { <list_of_additional_storage_parameters> } service_s3 = {} } } ... }Where:
-
policy: Query retry method. The possible values are:TASK: Retries the intermediate task within the query that caused worker failure.QUERY: Retries all stages of the query in which the worker failed.
-
additional_properties: Additional query retry parameters in"<key>" = "<value>"format. For more information about parameters, see the Trino documentation . -
exchangeManager: Exchange Manager storage parameters:service_s3: Use an S3 storage to write data when retrying queries.additional_properties: Additional Exchange Manager storage parameters in"<key>" = "<value>"format. For more information about parameters, see the Trino documentation .
-
-
To set up the maintenance window (for disabled clusters as well), add the
maintenance_windowsection to the cluster description:resource "yandex_trino_cluster" "<cluster_name>" { ... maintenance_window = { type = <maintenance_type> day = <day_of_week> hour = <hour> } ... }Where:
type: Maintenance type. The possible values include:ANYTIME: AnytimeWEEKLY: On a schedule
day: Day of week for theWEEKLYtype, i.e.,MON,TUE,WED,THU,FRI,SAT, orSUN.hour: UTC hour for theWEEKLYtype, from1to24.
-
Validate your configuration.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm updating the resources.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information about the resources you can create with Terraform, see this provider article.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Create a file named
body.jsonand paste the following code into it:Note
This example does not use all available parameters. For a list of all parameters, see the API documentation.
{ "folderId": "<folder_ID>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { <label_list> }, "trino": { "coordinatorConfig": { "resources": { "resourcePresetId": "<class_of_computing_resources>" } }, "workerConfig": { "resources": { "resourcePresetId": "<class_of_computing_resources>" }, "scalePolicy": { "autoScale": { "minCount": "<minimum_number_of_instances>", "maxCount": "<maximum_number_of_instances>" } } }, "retryPolicy": { "policy": "<object_type_for_retry>", "exchangeManager": { "storage": { "serviceS3": {} }, "additionalProperties": {<additional_storage_parameters>} }, "additionalProperties": {<additional_retry_parameters>} }, "version": "<version>" }, "network": { "subnetIds": [ <list_of_subnet_IDs> ], "securityGroupIds": [ <list_of_security_group_IDs> ] }, "deletionProtection": "<deletion_protection>", "serviceAccountId": "<service_account_ID>", "logging": { "enabled": "<use_of_logging>", "folderId": "<folder_ID>", "minLevel": "<logging_level>" } }Where:
-
folderId: Folder ID. You can get it with the list of folders in the cloud. -
name: Cluster name. -
description: Cluster description. -
labels: List of labels provided in"<key>": "<value>"format. -
trino: Configuration of Trino cluster components.-
coordinatorConfig: Coordinator configuration.resources.resourcePresetId: Class of the coordinator's computing resources.
-
workerConfig: Worker configuration.-
resources.resourcePresetId: Class of the worker's computing resources. -
scalePolicy: Worker scaling policy:-
fixedScale: Fixed scaling policy.count: Number of workers.
-
autoScale: Autoscaling policy.minCount: Minimum number of workers.maxCount: Maximum number of workers.
Specify either
fixedScaleorautoScale. -
-
-
retryPolicy: Fault-tolerant query execution parameters.-
policy: Query retry method. The possible values are:TASK: Retries the intermediate task within the query that caused worker failure.QUERY: Retries all stages of the query where worker failure occurred.
-
exchangeManager.additionalProperties: Additional Exchange Manager storage parameters inkey: valueformat. For more information about parameters, see the Trino documentation . -
additionalProperties: Additional parameters inkey: valueformat. For more information about parameters, see the Trino documentation .
-
-
version: Trino version.Note
After you create a cluster, you can change your Trino version. You can either upgrade or downgrade the version.
-
-
network: Network settings:subnetIds: List of subnet IDs.securityGroupIds: List of security group IDs.
-
deletionProtection: Enables cluster protection against accidental deletion. The possible values aretrueorfalse.Even if it is enabled, one can still connect to the cluster manually and delete it.
-
serviceAccountId: Service account ID. -
logging: Logging parameters:enabled: Enables logging. Logs generated by Trino components will be sent to Yandex Cloud Logging. The possible values aretrueorfalse.minLevel: Minimum logging level. The possible values areTRACE,DEBUG,INFO,WARN,ERROR, andFATAL.folderId: Folder ID. Logs will be written to the default log group for this folder.logGroupId: Custom log group ID. Logs will be written to this group.
Specify either
folderIdorlogGroupId.
-
-
Use the Cluster.create method and send the following request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://trino.api.cloud.yandex.net/managed-trino/v1/clusters' --data '@body.json' -
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Create a file named
body.jsonand paste the following code into it:Note
This example does not use all available parameters. For a list of all parameters, see the API documentation.
{ "folder_id": "<folder_ID>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { <label_list> }, "trino": { "coordinator_config": { "resources": { "resource_preset_id": "<class_of_computing_resources>" } }, "worker_config": { "resources": { "resource_preset_id": "<class_of_computing_resources>" }, "scale_policy": { "auto_scale": { "min_count": "<minimum_number_of_instances>", "max_count": "<maximum_number_of_instances>" } } }, "retry_policy": { "policy": "<object_type_for_retry>", "exchange_manager": { "storage": { "service_s3": "" }, "additional_properties": {<additional_storage_parameters>} }, "additional_properties": {<additional_retry_parameters>} }, "version": "<version>" }, "network": { "subnet_ids": [ <list_of_subnet_IDs> ], "security_group_ids": [ <list_of_security_group_IDs> ] }, "deletion_protection": "<deletion_protection>", "service_account_id": "<service_account_ID>", "logging": { "enabled": "<use_of_logging>", "folder_id": "<folder_ID>", "min_level": "<logging_level>" } }Where:
-
folder_id: Folder ID. You can get it with the list of folders in the cloud. -
name: Cluster name. -
description: Cluster description. -
labels: List of labels provided in"<key>": "<value>"format. -
trino: Configuration of Trino cluster components.-
coordinator_config: Coordinator configuration.resources.resource_preset_id: Class of the coordinator's computing resources.
-
worker_config: Worker configuration.-
resources.resource_preset_id: Class of the worker's computing resources. -
scale_policy: Worker scaling policy:-
fixed_scale: Fixed scaling policy.count: Number of workers.
-
auto_scale: Autoscaling policy.min_count: Minimum number of workers.max_count: Maximum number of workers.
Specify either
fixed_scaleorauto_scale. -
-
-
retry_policy: Fault-tolerant query execution parameters.-
policy: Query retry method. The possible values are:TASK: Retries the intermediate task within the query that caused worker failure.QUERY: Retries all stages of the query where worker failure occurred.
-
exchange_manager.additional_properties: Additional Exchange Manager storage parameters inkey: valueformat. For more information about parameters, see the Trino documentation . -
additional_properties: Additional parameters inkey: valueformat. For more information about parameters, see the Trino documentation .
-
-
version: Trino version.Note
After you create a cluster, you can change your Trino version. You can either upgrade or downgrade the version.
-
-
network: Network settings:subnet_ids: List of subnet IDs.security_group_ids: List of security group IDs.
-
deletion_protection: Enables cluster protection against accidental deletion. The possible values aretrueorfalse.Even if it is enabled, one can still connect to the cluster manually and delete it.
-
service_account_id: Service account ID. -
logging: Logging parameters:enabled: Enables logging. Logs generated by Trino components will be sent to Yandex Cloud Logging. The possible values aretrueorfalse.min_level: Minimum logging level. The possible values areTRACE,DEBUG,INFO,WARN,ERROR, andFATAL.folder_id: Folder ID. Logs will be written to the default log group for this folder.log_group_id: Custom log group ID. Logs will be written to this group.
Specify either
folder_idorlog_group_id.
-
-
Use the ClusterService/Create call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/trino/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d @ \ trino.api.cloud.yandex.net:443 \ yandex.cloud.trino.v1.ClusterService.Create \ < body.json -
View the server response to make sure your request was successful.
Examples
Create a Managed Service for Trino cluster with the following test specifications:
- Name:
mytr. - Service account:
ajev56jp96ji********. - Subnet:
b0rcctk2rvtr8efcch64. - Security group:
enp6saqnq4ie244g67sb. - Coordinator with computing resource class
c4-m16. - Four workers with computing resource class
c4-m16. - Cluster protection from accidental deletion.
Run this command:
yc managed-trino cluster create \
--name mytr \
--service-account-id ajev56jp96ji******** \
--subnet-ids b0rcctk2rvtr8efcch64 \
--security-group-ids enp6saqnq4ie244g67sb \
--coordinator resource-preset-id=c4-m16 \
--worker resource-preset-id=c4-m16,count=4 \
--deletion-protection
Create a Managed Service for Trino cluster and a network for it with the following test specifications:
- Name:
mytr. - Service account:
ajev56jp96ji********. - Network:
mtr-network. - Subnet:
mtr-subnet. The subnet availability zone isru-central1-a; the range is10.1.0.0/16. - Coordinator with computing resource class
c4-m16. - Four workers with computing resource class
c4-m16. - Cluster protection from accidental deletion.
The configuration file for this cluster is as follows:
resource "yandex_trino_cluster" "mytr" {
name = "mytr"
service_account_id = "ajev56jp96ji********"
deletion_protection = true
subnet_ids = [yandex_vpc_subnet.mtr-subnet.id]
coordinator = {
resource_preset_id = "c4-m16"
}
worker = {
fixed_scale = {
count = 4
}
resource_preset_id = "c4-m16"
}
}
resource "yandex_vpc_network" "mtr-network" {
name = "mtr-network"
}
resource "yandex_vpc_subnet" "mtr-subnet" {
name = "mtr-subnet"
zone = "ru-central1-a"
network_id = yandex_vpc_network.mtr-network.id
v4_cidr_blocks = ["10.1.0.0/16"]
}