Connecting Yandex Data Processing to Metastore
Note
To use a Metastore cluster, your Yandex Data Processing cluster must contain the SPARK
and YARN
components.
-
When creating or updating a Yandex Data Processing cluster, specify the following property:
spark:spark.hive.metastore.uris : thrift://<Metastore_cluster_IP_address>:9083
To find out the Metastore cluster IP address, select Yandex MetaData Hub in the management console
and then select the Metastore page in the left-hand panel. You will see the cluster IP address under General information. -
If the Metastore cluster and Yandex Data Processing cluster are hosted in different cloud networks, set up routing between these cloud networks so that the Metastore subnet is accessible from the Yandex Data Processing subnet.
There are multiple ways to configure routing. For example, you can create an IPsec tunnel.
-
If the cloud network uses security groups, set up the security group of the Yandex Data Processing cluster to work with Metastore. To do this, add the following rule for outgoing traffic:
- Port range:
9083
- Protocol:
Any
(Any
). - Source:
CIDR
. - CIDR blocks:
0.0.0.0/0
- Port range:
For an example of using Yandex Data Processing with a Metastore cluster connected, see the Shared use of Yandex Data Processing tables through Metastore tutorial.