Getting started with Hive Metastore
Note
This feature is in the Preview stage.
In Yandex MetaData Hub, you can create Hive Metastore clusters and use them to work with Yandex Data Processing clusters.
Getting started
-
Go to the management console
and log in to Yandex Cloud or sign up if not signed up yet. -
If you do not have a folder yet, create one:
-
In the management console
, select the appropriate cloud in the list on the left. -
At the top right, click
Create folder. -
Enter the folder name. The naming requirements are as follows:
- The name must be from 3 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- The first character must be a letter and the last character cannot be a hyphen.
-
(Optional) Enter a description of the folder.
-
Select Create a default network. This will create a network with subnets in each availability zone. Within this network, a default security group will be created, inside which all network traffic is allowed.
-
Click Create.
-
-
Set up a NAT gateway in the subnet to host Metastore and Yandex Data Processing clusters.
-
Create a security group for Metastore and Yandex Data Processing clusters.
-
Add rules Metastore cluster to the security group:
-
For incoming client traffic:
- Port range:
30000-32767
- Protocol:
Any
(Any
) - Source:
CIDR
- CIDR blocks:
0.0.0.0/0
- Port range:
-
For incoming load balancer traffic:
- Port range:
10256
- Protocol:
Any
(Any
) - Source:
Load balancer healthchecks
- Port range:
-
-
Add Yandex Data Processing cluster rules to the security group:
-
One rule for inbound and another one for outbound service traffic:
- Port range:
0-65535
- Protocol:
Any
- Source/Destination name:
Security group
- Security group:
Current
- Port range:
-
A separate rule for outgoing HTTPS traffic to all addresses. This will allow you to use Yandex Object Storage buckets, UI Proxy, and autoscaling of Yandex Data Processing subclusters.
- Port range:
443
- Protocol:
TCP
- Destination name:
CIDR
- CIDR blocks:
0.0.0.0/0
- Port range:
-
Rule that allows access to NTP servers for time syncing:
- Port range:
123
- Protocol:
UDP
- Destination name:
CIDR
- CIDR blocks:
0.0.0.0/0
- Port range:
-
-
Create a service account for a Yandex Data Processing cluster with the
dataproc.agent
anddataproc.provisioner
roles. -
Create an Object Storage bucket to use with the Yandex Data Processing cluster.
-
In the network you created earlier, create a Yandex Data Processing cluster. Configure it as follows:
- Select
SPARK
andYARN
. - Set the
spark:spark.sql.hive.metastore.sharedPrefixes
property tocom.amazonaws,ru.yandex.cloud
.
- Select
Create a Metastore cluster
- In the management console, go to the folder you created earlier.
- Select Yandex MetaData Hub.
- In the left-hand panel, select the
Metastore. - Click Create cluster.
- Enter a name for the cluster. It must be unique within the folder.
- Under Network settings, select the network and subnet you created earlier.
- Under Security groups, select the security group you created earlier.
- Click Create.
Connect the Metastore cluster to the Yandex Data Processing cluster
-
In the Yandex Data Processing cluster you created earlier, specify the following property:
spark:spark.hive.metastore.uris : thrift://<Metastore_cluster_IP_address>:9083
To find out the Metastore cluster IP address, select Yandex MetaData Hub** in the management console, then Metastore in the left-hand panel, and open the relevant cluster. You will see the cluster IP address under General information.
-
Add the following outgoing traffic rule to the security group:
- Port range:
9083
- Protocol:
Any
(Any
) - Source:
CIDR
- CIDR blocks:
0.0.0.0/0
- Port range: