Connecting Yandex Data Proc to Metastore
Note
To use the Metastore cluster, a Yandex Data Proc cluster must include SPARK
and YARN
components.
-
When creating or updating a Yandex Data Proc cluster, specify the following property:
spark:spark.hive.metastore.uris : thrift://<Metastore_cluster_IP_address>:9083
To find out the Metastore cluster IP address, select Yandex MetaData Hub in the management console
and then select the Metastore page in the left-hand panel. You will see the cluster IP address under General information. -
If the Metastore cluster and Yandex Data Proc cluster are hosted in different cloud networks, set up routing between these cloud networks so that the Metastore subnet is accessible from the Yandex Data Proc subnet.
There are multiple ways to configure routing. For example, you can create an IPsec tunnel.
-
If the cloud network uses security groups, set up the security group of the Yandex Data Proc cluster to work with Metastore. To do this, add the following rule for outgoing traffic:
- Port range:
9083
- Protocol:
Any
(Any
) - Source:
CIDR
- CIDR blocks:
0.0.0.0/0
- Port range:
For an example of using Yandex Data Proc with a Metastore cluster connected, see the Shared use of Yandex Data Proc tables through Metastore tutorial.