Getting started with Hive Metastore
Note
This feature is in the Preview stage.
In Yandex MetaData Hub you can create Hive Metastore clusters и use them to work with Yandex Data Proc clusters.
Getting started
-
Go to the management console
and log in to Yandex Cloud or create an account if you do not have one yet. -
If you do not have a folder yet, create one:
-
In the management console
, select the appropriate cloud in the list on the left. -
At the top right, click
Create folder. -
Enter the folder name. The naming requirements are as follows:
- The name must be from 3 to 63 characters long.
- It may contain lowercase Latin letters, numbers, and hyphens.
- The first character must be a letter and the last character cannot be a hyphen.
-
(Optional) Enter a description of the folder.
-
Select Create a default network. This will create a network with subnets in each availability zone. Within this network, a default security group will be created, inside which all network traffic is allowed.
-
Click Create.
-
-
Set up a NAT gateway in the subnet to host Metastore and Yandex Data Proc clusters.
-
Create a security group for Metastore and Yandex Data Proc clusters.
-
Add rules for the Metastore cluster to the security group:
-
For incoming client traffic::
- Port range —
30000-32767
. - Protocol —
Any
(Any
). - Source —
CIDR
. - CIDR blocks —
0.0.0.0/0
.
- Port range —
-
For incoming load balancer traffic:
- Port range —
10256
. - Protocol —
Any
(Any
). - Source —
Load balancer healthchecks
.
- Port range —
-
-
Add rules for the Yandex Data Proc cluster to the security group:
-
One rule for inbound and another one for outbound service traffic:
- Port range —
0-65535
. - Protocol —
Any
. - Source/Destination name —
Security group
. - Security group —
Current
.
- Port range —
-
A separate rule for outgoing HTTPS traffic. This will allow you to use Yandex Object Storage buckets, UI Proxy, and autoscaling of Yandex Data Proc subclusters.
- Port range —
443
. - Protocol —
TCP
. - Destination name —
CIDR
. - CIDR blocks —
0.0.0.0/0
.
- Port range —
-
Rule that allows access to NTP servers for time syncing:
- Port range —
123
. - Protocol —
UDP
. - Destination name —
CIDR
. - CIDR blocks —
0.0.0.0/0
.
- Port range —
-
-
Create a service account for the Yandex Data Proc cluster with
dataproc.agent
anddataproc.provisioner
roles. -
Create an Object Storage bucket to use with the Yandex Data Proc cluster.
-
Create a Yandex Data Proc cluster in the previously created network. In the settings:
- Choose
SPARK
andYARN
services. - Specify the
spark:spark.sql.hive.metastore.sharedPrefixes
setting with the valuecom.amazonaws,ru.yandex.cloud
.
- Choose
Create a Metastore cluster
- In the management console
, select the folder you created. - Select Yandex MetaData Hub.
- In the left-hand panel, select
Metastore. - Click Create cluster.
- Enter a name for the cluster. It must be unique within the folder.
- Under Network settings, select the network and subnet you created.
- Under Security groups, specify the security group that you previously set up.
- Click Create.
Connect Metastore to the Yandex Data Proc cluster
-
In the Yandex Data Proc cluster you previously created specify the following property:
spark:spark.hive.metastore.uris : thrift://<Metastore_cluster_IP_address>:9083
To find out the Metastore cluster IP address, select Yandex MetaData Hub in the management console
and then select the Metastore page in the left-hand panel. You will see the cluster IP address under General information. -
Add the following rule for outgoing traffic to the security group:
- Port range —
9083
. - Protocol —
Any
(Any
). - Source —
CIDR
. - CIDR blocks —
0.0.0.0/0
.
- Port range —