Connecting Yandex Data Processing to Apache Hive™ Metastore
Note
To use a Apache Hive™ Metastore cluster, your Yandex Data Processing cluster must contain the SPARK and YARN components.
-
Create a Apache Hive™ Metastore cluster.
-
When creating or updating a Yandex Data Processing cluster, specify the following property:
spark:spark.hive.metastore.uris : thrift://<Apache Hive™ Metastore_cluster_IP_address>:9083To find out the Apache Hive™ Metastore cluster IP address, select Yandex MetaData Hub in the management console
and then select Metastore in the left-hand panel. Copy the IP address column value for the cluster in question. -
If the Apache Hive™ Metastore cluster and Yandex Data Processing cluster are hosted in different cloud networks, set up routing between these cloud networks so that the Apache Hive™ Metastore subnet is accessible from the Yandex Data Processing subnet.
There are multiple ways to configure routing. For example, you can create an IPsec tunnel.
-
If the cloud network uses security groups, set up the security group of the Yandex Data Processing cluster to work with Apache Hive™ Metastore. To do this, add the following rule for outgoing traffic:
- Port range:
9083 - Protocol:
Any(Any) - Source:
CIDR - CIDR blocks:
0.0.0.0/0
- Port range:
For an example of using Yandex Data Processing with a Apache Hive™ Metastore cluster connected, see the Shared use of Yandex Data Processing tables through Apache Hive™ Metastore tutorial.
Apache® and Apache Hive™