Creating a Apache Hive™ Metastore cluster
To learn more about Apache Hive™ Metastore clusters in Yandex MetaData Hub, see Apache Hive™ Metastore clusters.
Getting started
-
To link a service account to a Apache Hive™ Metastore cluster, make sure your Yandex Cloud account has the iam.serviceAccounts.user role or higher.
-
Set up a NAT gateway in the subnet the cluster will connect to. It is needed for the cluster to interact with Yandex Cloud services.
-
Assign the
managed-metastore.integrationProviderrole to the service account. This role enables the cluster to work with Yandex Cloud services, e.g., Yandex Cloud Logging and Yandex Monitoring, under a service account.You can also add more roles. Their combination depends on your specific use case. To view the service roles, see the Apache Hive™ Metastore section, and for all available roles, see this reference.
-
If you want to save cluster logs to a custom log group, create one.
For more information, see Transferring cluster logs.
Creating a cluster
-
In the management console
, select the folder where you want to create a server. -
Select Yandex MetaData Hub.
-
In the left-hand panel, select
Metastore. -
Click Create cluster.
-
Enter a name for the cluster. It must be unique within the folder.
-
Optionally, enter a description for the cluster.
-
Optionally, add Yandex Cloud labels to break resources into logical groups.
-
Specify the service account you created earlier.
-
Select the Apache Hive™ Metastore version.
Available versions: 3.1 and 4.0.
Warning
To integrate the Apache Hive™ Metastore cluster with Yandex Managed Service for Trino and Yandex Managed Service for Apache Spark™, you need version 3.1.
If required, you can upgrade 3.1 to 4.0, but you cannot downgrade 4.0 to 3.1.
-
Under Network settings, select the network and subnet to host the Apache Hive™ Metastore cluster. Specify the security group you configured previously.
-
Under Metastore, select the cluster configuration.
-
Optionally, configure logging settings:
-
Enable Write logs.
-
Select where to write cluster logs to:
- Default log group: Select Folder in the Destination field and specify the folder. Logs will be stored in the selected folder's default log group.
- Custom log group: Select Log group in the Destination field and specify the log group you created earlier.
-
Select the minimum logging level.
The execution log will contain logs of this level or higher. The available levels are
TRACE,DEBUG,INFO,WARN,ERROR, andFATAL. The default isINFO.
-
-
If required, enable protection of the cluster from accidental deletion by a user.
Even with deletion protection enabled, one can still connect to the cluster manually and delete the data.
-
Click Create.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To create a Apache Hive™ Metastore cluster:
-
See the description of the CLI command for creating a cluster:
yc managed-metastore cluster create --help -
Specify the cluster properties in the creation command:
yc managed-metastore cluster create \ --name <cluster_name> \ --description <cluster_description> \ --labels <label_list> \ --service-account-id <service_account_ID> \ --version <Apache_Hive™_Metastore_version> \ --subnet-ids <subnet_IDs> \ --security-group-ids <security_group_IDs> \ --resource-preset-id <ID_of_computing_resources> \ --maintenance-window type=<maintenance_type>,` `day=<day_of_week>,` `hour=<hour> \ --deletion-protection \ --log-enabled \ --log-folder-id <folder_ID> \ --log-min-level <logging_level>Where:
--name: Cluster name.
-
--description: Cluster description. -
--labels: List of labels. Provide labels in<key>=<value>format. -
--service-account-id: Service account ID. -
--version: Apache Hive™ Metastore version.Available versions: 3.1 and 4.0.
Warning
To integrate the Apache Hive™ Metastore cluster with Yandex Managed Service for Trino and Yandex Managed Service for Apache Spark™, you need version 3.1.
If required, you can upgrade 3.1 to 4.0, but you cannot downgrade 4.0 to 3.1.
--subnet-ids: List of subnet IDs.
-
--security-group-ids: List of security group IDs. -
--resource-preset-id: Computing resource configuration. -
--maintenance-window: Maintenance window settings (including for disabled clusters), wheretypeis the maintenance type:anytime: At any time (default).weekly: On a schedule. For this value, also specify the following:day: Day of week, i.e.,MON,TUE,WED,THU,FRI,SAT, orSUN.hour: Hour of day (UTC), from1to24.
-
--deletion-protection: Enables cluster protection against accidental deletion. -
Logging parameters:
-
--log-enabled: Enables logging. Logs generated by Apache Hive™ Metastore components will be sent to Yandex Cloud Logging. -
--log-folder-id: Folder ID. Logs will be written to the default log group for this folder. -
--log-group-id: Custom log group ID. Logs will be written to this group.Specify one of the two parameters:
--log-folder-idor--log-group-id. -
--log-min-level: Minimum logging level. Possible values:TRACE,DEBUG,INFO(default),WARN,ERROR, andFATAL.
-
With Terraform
Terraform is distributed under the Business Source License
For more information about the provider resources, see the relevant documentation on the Terraform
If you do not have Terraform yet, install it and configure the Yandex Cloud provider.
To create a Apache Hive™ Metastore cluster:
-
In the configuration file, describe the resources you are creating:
-
Apache Hive™ Metastore cluster: Cluster description.
-
Network: Description of the cloud network where a cluster will be located. If you already have a suitable network, you don't have to describe it again.
-
Subnets: Description of the subnets to connect the cluster hosts to. If you already have suitable subnets, you don't have to describe them again.
Here is an example of the configuration file structure:
resource "yandex_metastore_cluster" "<cluster_name>" { name = "<cluster_name>" subnet_ids = [yandex_vpc_subnet.<subnet_name>.id] security_group_ids = [<list_of_security_group_IDs>] service_account_id = "<service_account_ID>" deletion_protection = <protect_cluster_from_deletion> version = "<version>" cluster_config = { resource_preset_id = "<class_of_computing_resources>" } maintenance_window = { type = "<maintenance_type>" day = "<day_of_week>" hour = <hour> } logging = { enabled = <enable_logging> folder_id = "<folder_ID>" min_level = "<logging_level>" } } resource "yandex_vpc_network" "<network_name>" { name = "<network_name>" } resource "yandex_vpc_subnet" "<subnet_name>" { name = "<subnet_name>" zone = "<availability_zone>" network_id = "<network_ID>" v4_cidr_blocks = ["<range>"] }Where:
-
name: Cluster name. -
subnet_ids: List of subnet IDs. -
security_group_ids: List of security group IDs. -
service_account_id: Service account ID. -
deletion_protection: Enables cluster protection against accidental deletion. The possible values aretrueorfalse. -
version: Apache Hive™ Metastore version.Available versions: 3.1 and 4.0.
Warning
To integrate the Apache Hive™ Metastore cluster with Yandex Managed Service for Trino and Yandex Managed Service for Apache Spark™, you need version 3.1.
If required, you can upgrade 3.1 to 4.0, but you cannot downgrade 4.0 to 3.1.
-
cluster_config.resource_preset_id: Computing resource configuration. -
maintenance_window: Maintenance window settings, including those for disabled clusters.type: Maintenance type. The possible values include:ANYTIME: Any time.WEEKLY: On a schedule.
day: Day of week for theWEEKLYtype, i.e.,MON,TUE,WED,THU,FRI,SAT, orSUN.hour: UTC hour for theWEEKLYtype, from1to24.
-
logging: Logging parameters:-
enable: Enables logging. Logs generated by Apache Hive™ Metastore components will be sent to Yandex Cloud Logging. The possible values aretrueorfalse. -
folder_id: Folder ID. Logs will be written to the default log group for this folder. -
group_id: Custom log group ID. Logs will be written to this group.Specify one of the two parameters:
folder_idorgroup_id. -
min_level: Minimum logging level. Possible values:TRACE,DEBUG,INFO(default),WARN,ERROR, andFATAL.
-
-
-
Validate your configuration.
-
In the command line, navigate to the directory that contains the current Terraform configuration files defining the infrastructure.
-
Run this command:
terraform validateTerraform will show any errors found in your configuration files.
-
-
Confirm resource changes.
-
Run this command to view the planned changes:
terraform planIf you described the configuration correctly, the terminal will display a list of the resources to update and their parameters. This is a verification step that does not apply changes to your resources.
-
If everything looks correct, apply the changes:
-
Run this command:
terraform apply -
Confirm updating the resources.
-
Wait for the operation to complete.
-
-
For more information about the resources you can create with Terraform, see this provider guide.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Create a file named
body.jsonand paste the following code into it:{ "folderId": "<folder_ID>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { "<label_list>" }, "deletionProtection": <deletion_protection>, "version": "<Apache_Hive™_Metastore_version>", "configSpec": { "resources": { "resourcePresetId": "<resource_configuration_ID>" } }, "serviceAccountId": "<service_account_ID>", "logging": { "enabled": <use_of_logging>, "folderId": "<folder_ID>", "minLevel": "<logging_level>" }, "network": { "subnetIds": [ "<list_of_subnet_IDs>" ], "securityGroupIds": [ "<list_of_security_group_IDs>" ] }, "maintenanceWindow": { "weeklyMaintenanceWindow": { "day": "<day_of_week>", "hour": "<hour>" } } }Where:
folderId: Folder ID. You can get it with the list of folders in the cloud.
-
name: Cluster name. -
description: Cluster description. -
labels: List of labels provided in"<key>": "<value>"format. -
deletionProtection: Enables cluster protection against accidental deletion. The possible values aretrueorfalse. -
version: Apache Hive™ Metastore version.Available versions: 3.1 and 4.0.
Warning
To integrate the Apache Hive™ Metastore cluster with Yandex Managed Service for Trino and Yandex Managed Service for Apache Spark™, you need version 3.1.
If required, you can upgrade 3.1 to 4.0, but you cannot downgrade 4.0 to 3.1.
-
configSpec.resources.resourcePresetId: Computing resource configuration. -
serviceAccountId: Service account ID. -
logging: Logging parameters:-
enabled: Enables logging. Logs generated by Apache Hive™ Metastore components will be sent to Yandex Cloud Logging. The possible values aretrueorfalse. -
folderId: Folder ID. Logs will be written to the default log group for this folder. -
logGroupId: Custom log group ID. Logs will be written to this group.Specify either
folderIdorlogGroupId. -
minLevel: Minimum logging level. The possible values areTRACE,DEBUG,INFO,WARN,ERROR, andFATAL.
-
-
network: Network settings:subnetIds: List of subnet IDs.securityGroupIds: List of security group IDs.
-
maintenanceWindow: Maintenance window settings (including for disabled clusters). InmaintenanceWindow, provide one of the two parameters:-
anytime: Maintenance can take place at any time. -
weeklyMaintenanceWindow: Maintenance takes place once a week at the specified time:day: Day of week, inDDDformat,MON,TUE,WED,THU,FRI,SAT, orSUN.hour: Time of day (UTC) inHHformat, from1to24.
-
-
Use the Cluster.Create method and send the following request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://metastore.api.cloud.yandex.net/managed-metastore/v1/clusters' \ --data '@body.json' -
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Create a file named
body.jsonand paste the following code into it:{ "folder_id": "<folder_ID>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": "{ <label_list> }", "deletion_protection": <deletion_protection>, "version": "<Apache_Hive™_Metastore_version>", "config_spec": { "resources": { "resource_preset_id": "<resource_configuration_ID>" } }, "service_account_id": "<service_account_ID>", "logging": { "enabled": <use_of_logging>, "folder_id": "<folder_ID>", "min_level": "<logging_level>" }, "network": { "subnet_ids": [ "<list_of_subnet_IDs>" ], "security_group_ids": [ "<list_of_security_group_IDs>" ] }, "maintenance_window": { "weekly_maintenance_window": { "day": "<day_of_week>", "hour": "<hour>" } } }Where:
folder_id: Folder ID. You can get it with the list of folders in the cloud.
-
name: Cluster name. -
description: Cluster description. -
labels: List of labels provided in"<key>": "<value>"format. -
deletion_protection: Enables cluster protection against accidental deletion. The possible values are:trueorfalse. -
version: Apache Hive™ Metastore version.Available versions: 3.1 and 4.0.
Warning
To integrate the Apache Hive™ Metastore cluster with Yandex Managed Service for Trino and Yandex Managed Service for Apache Spark™, you need version 3.1.
If required, you can upgrade 3.1 to 4.0, but you cannot downgrade 4.0 to 3.1.
-
config_spec.resources.resource_preset_id: Computing resource configuration. -
service_account_id: Service account ID. -
logging: Logging parameters:-
enabled: Enables logging. Logs generated by Apache Hive™ Metastore components will be sent to Yandex Cloud Logging. The possible values are:trueorfalse. -
folder_id: Folder ID. Logs will be written to the default log group for this folder. -
log_group_id: Custom log group ID. Logs will be written to this group.Specify one of the two parameters:
folder_idorlog_group_id. -
min_level: Minimum logging level. The possible values are:TRACE,DEBUG,INFO,WARN,ERROR, andFATAL.
-
-
network: Network settings:subnet_ids: List of subnet IDs.security_group_ids: List of security group IDs.
-
maintenance_window: Maintenance window settings (including for disabled clusters). Inmaintenance_window, provide one of the two parameters:-
anytime: Maintenance can take place at any time. -
weekly_maintenance_window: Maintenance takes place once a week at the specified time:day: Day of week, inDDDformat,MON,TUE,WED,THU,FRI,SAT, orSUN.hour: Time of day (UTC) inHHformat, from1to24.
-
-
Use the ClusterService.Create call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/metastore/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d @ \ metastore.api.cloud.yandex.net:443 \ yandex.cloud.metastore.v1.ClusterService.Create \ < body.json -
View the server response to make sure your request was successful.
Apache® and Apache Hive™