Exporting and importing Hive metadata in a Apache Hive™ Metastore cluster
Getting started
- Create a service account named
my-accountwith thestorage.uploaderandmanaged-metastore.integrationProviderroles. - Configure the network and create a Apache Hive™ Metastore cluster. When creating it, specify the
my-accountservice account. - Create a bucket in Yandex Object Storage. It will store the metadata file for import and export.
- Grant the
READ and WRITEpermission tomy-accountfor the bucket you created earlier.
For more information about connecting to the bucket with configured bucket policies, see this guide.
Exporting data
-
Navigate to the folder dashboard
and select Yandex MetaData Hub. -
In the left-hand panel, select
Metastore. -
Click
for the cluster you need and select Export. -
In the window that opens, specify the following:
- Bucket you created earlier for cluster data export.
- The
.sqlfile the cluster data will be written to. If a file with that name already exists, it will be overwritten.
-
Click Export.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To export metadata from a Apache Hive™ Metastore cluster, run this command:
yc managed-metastore cluster export-data <cluster_name_or_ID> \
--bucket <bucket_name> \
--filepath <data_file>
Where:
--bucket: Bucket you created earlier for cluster data export.--filepath: Path to the.sqlfile to which the cluster data will be written. If a file with that name already exists, it will be overwritten.
You can get the cluster ID and name with the list of clusters in the folder.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the Cluster.ExportData method and send the following request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://metastore.api.cloud.yandex.net/managed-metastore/v1/clusters/<cluster_ID>:export' \ --data '{ "bucket": "<bucket_name>", "filepath": "<data_file>" }'Where:
bucket: Bucket you created earlier for cluster data export.filepath: Path to the.sqlfile to which the cluster data will be written. If a file with that name already exists, it will be overwritten.
You can get the cluster ID and name with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the ClusterService.ExportData call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/metastore/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "bucket": "<bucket_name>", "filepath": "<data_file>" }' \ metastore.api.cloud.yandex.net:443 \ yandex.cloud.metastore.v1.ClusterService.ExportDataWhere:
bucket: Bucket you created earlier for cluster data export.filepath: Path to the.sqlfile to which the cluster data will be written. If a file with that name already exists, it will be overwritten.
You can get the cluster ID with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
Importing data
Before importing, upload the .sql file with metadata into the bucket you created earlier. For information on how to prepare a file and how the import process works, see Transferring metadata between Yandex Data Processing clusters using Apache Hive™ Metastore.
To import data to a Apache Hive™ Metastore cluster:
- Navigate to the folder page
and select Yandex MetaData Hub. - In the left-hand panel, select
Metastore. - Click
for the cluster you need and select Import. - In the window that opens, select the bucket you created earlier and the file to import the cluster data from.
- Click Import.
If you do not have the Yandex Cloud CLI installed yet, install and initialize it.
By default, the CLI uses the folder specified when creating the profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id parameter.
To import metadata to a Apache Hive™ Metastore cluster, run this command:
yc managed-metastore cluster import-data <cluster_name_or_ID> \
--bucket <bucket_name> \
--filepath <data_file>
Where:
--bucket: Bucket you created earlier to import the cluster data from.--filepath: Path to the.sqlfile to import the cluster data from.
You can get the cluster ID and name with the list of clusters in the folder.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Use the Cluster.ImportData method and send the following request, e.g., via cURL
:curl \ --request POST \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://metastore.api.cloud.yandex.net/managed-metastore/v1/clusters/<cluster_ID>:import' \ --data '{ "bucket": "<bucket_name>", "filepath": "<data_file>" }'Where:
bucket: Bucket you created earlier to import the cluster data from.filepath: Path to the.sqlfile to import the cluster data from.
You can get the cluster ID and name with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
-
Get an IAM token for API authentication and save it as an environment variable:
export IAM_TOKEN="<IAM_token>" -
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapiBelow, we assume the repository contents are stored in the
~/cloudapi/directory. -
Use the ClusterService.ImportData call and send the following request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/metastore/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d '{ "cluster_id": "<cluster_ID>", "bucket": "<bucket_name>", "filepath": "<data_file>" }' \ metastore.api.cloud.yandex.net:443 \ yandex.cloud.metastore.v1.ClusterService.ImportDataWhere:
bucket: Bucket you created earlier for cluster data export.filepath: Path to the.sqlfile to which the cluster data will be written. If a file with that name already exists, it will be overwritten.
You can get the cluster ID with the list of clusters in the folder.
-
View the server response to make sure your request was successful.
Apache® and Apache Hive™