Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex MetaData Hub
    • Overview
    • Connection Manager
    • Hive Metastore
    • Schema Registry
  • Audit Trails events
  • Pricing policy
  • Troubleshooting
  • Public materials
  • Release notes

In this article:

  • Getting started
  • Create a Metastore cluster
  • Connect the Metastore cluster to the Yandex Data Processing cluster
  • What's next
  1. Getting started
  2. Hive Metastore

Getting started with Hive Metastore

Written by
Yandex Cloud
Improved by
Danila N.
Updated at May 5, 2025
  • Getting started
  • Create a Metastore cluster
  • Connect the Metastore cluster to the Yandex Data Processing cluster
  • What's next

Note

This feature is at the Preview stage.

In Yandex MetaData Hub, you can create Hive Metastore clusters and use them to work with Yandex Data Processing clusters.

Getting startedGetting started

  1. Go to the management console and log in to Yandex Cloud or sign up if not signed up yet.

  2. If you do not have a folder yet, create one:

    1. In the management console, select the appropriate cloud from the list on the left.

    2. At the top right, click Create folder.

    3. Give your folder a name. The naming requirements are as follows:

      • It must be from 2 to 63 characters long.
      • It may contain lowercase Latin letters, numbers, and hyphens.
      • It must start with a letter and cannot end with a hyphen.
    4. Optionally, specify the description for your folder.

    5. Select Create a default network. This will create a network with subnets in each availability zone. Within this network, you will also have a default security group, inside which all network traffic will be allowed.

    6. Click Create.

  3. To link your service account to a Metastore cluster, make sure your Yandex Cloud account has the iam.serviceAccounts.user role or higher.

    Note

    If you are unable to manage roles, contact your cloud or organization administrator.

  4. Set up a NAT gateway in the subnet to host Metastore and Yandex Data Processing clusters.

  5. Create a security group for Metastore and Yandex Data Processing clusters.

  6. Add Metastore cluster rules to the security group:

    • For incoming client traffic:

      • Port range: 30000-32767
      • Protocol: Any (Any)
      • Source: CIDR
      • CIDR blocks: 0.0.0.0/0
    • For incoming load balancer traffic:

      • Port range: 10256
      • Protocol: Any (Any)
      • Source: Load balancer healthchecks
  7. Add Yandex Data Processing cluster rules to the security group:

    • One rule for inbound and another one for outbound service traffic:

      • Port range: 0-65535
      • Protocol: Any
      • Source/Destination name: Security group
      • Security group: Current
    • A separate rule for outgoing HTTPS traffic to all addresses. This will allow you to use Yandex Object Storage buckets, UI Proxy, and autoscaling of Yandex Data Processing subclusters.

      • Port range: 443
      • Protocol: TCP
      • Destination name: CIDR
      • CIDR blocks: 0.0.0.0/0
    • Rule that allows access to NTP servers for time syncing:

      • Port range: 123
      • Protocol: UDP
      • Destination name: CIDR
      • CIDR blocks: 0.0.0.0/0
  8. Create a service account with the dataproc.agent, dataproc.provisioner, and managed-metastore.integrationProvider roles.

  9. Create an Object Storage bucket to interact with a Yandex Data Processing cluster.

  10. In the network you created earlier, create a Yandex Data Processing cluster. In the settings, specify:

    • SPARK and YARN services.
    • Service account you created earlier.
    • spark:spark.sql.hive.metastore.sharedPrefixes property with the com.amazonaws,ru.yandex.cloud value. Required for PySpark jobs and integration with Metastore.
    • Bucket you created earlier.
    • Security group you configured earlier.

Create a Metastore clusterCreate a Metastore cluster

Management console
  1. In the management console, go to the folder you created earlier.
  2. Select Yandex MetaData Hub.
  3. In the left-hand panel, select the Metastore.
  4. Click Create cluster.
  5. Enter a name for the cluster. It must be unique within the folder.
  6. Select a service account under which the Metastore cluster will interact with other Yandex Cloud services, or create a new one.
  7. Under Network settings, select the network and subnet you created earlier. Specify the security group you configured previously.
  8. Optionally, under Logging, enable logging, select the minimum logging level, and specify the folder or log group.
  9. If required, enable protection of the cluster from accidental deletion by a user.
  10. Click Create.

Connect the Metastore cluster to the Yandex Data Processing clusterConnect the Metastore cluster to the Yandex Data Processing cluster

Management console
  1. In the Yandex Data Processing cluster you created earlier, specify the following property:

    spark:spark.hive.metastore.uris : thrift://<Metastore_cluster_IP_address>:9083
    

    To find out the Metastore cluster IP address, select Yandex MetaData Hub in the management console and then select the Metastore page in the left-hand panel. Copy the IP address column value for the cluster.

  2. Add the following outgoing traffic rule to the security group:

    • Port range: 9083
    • Protocol: Any (Any)
    • Source: CIDR
    • CIDR blocks: 0.0.0.0/0

What's nextWhat's next

  • Work with tables using Metastore.
  • Use Metastore to move data between Yandex Data Processing clusters.
  • Store tabular data in Metastore when using Apache Airflow™.
  • Export and import Hive metadata in a Metastore cluster.

Was the article helpful?

Previous
Connection Manager
Next
Schema Registry
Yandex project
© 2025 Yandex.Cloud LLC