Yandex Cloud
Search
Contact UsGet started
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML Services
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
  • Pricing
  • Customer Stories
  • Documentation
  • Blog
© 2025 Direct Cursus Technology L.L.C.
Yandex MetaData Hub
  • About Yandex MetaData Hub
    • Getting started
    • Service roles for access management
    • Troubleshooting
    • Audit Trails events
  • Access management
  • Quotas and limits
  • Pricing policy
  • Public materials
  • Release notes

In this article:

  • Getting started
  • Create a Apache Hive™ Metastore cluster
  • Connect the Apache Hive™ Metastore cluster to the Yandex Data Processing cluster
  • What's next
  1. Apache Hive™ Metastore
  2. Getting started

Getting started with Apache Hive™ Metastore

Written by
Yandex Cloud
Improved by
Danila N.
Updated at October 15, 2025
  • Getting started
  • Create a Apache Hive™ Metastore cluster
  • Connect the Apache Hive™ Metastore cluster to the Yandex Data Processing cluster
  • What's next

In Yandex MetaData Hub, you can create Apache Hive™ Metastore clusters and use them to work with Yandex Data Processing clusters.

Getting startedGetting started

  1. Navigate to the management console and log in to Yandex Cloud or sign up if not signed up yet.

  2. If you do not have a folder yet, create one:

    1. In the management console, select the appropriate cloud from the list on the left.

    2. At the top right, click Create folder.

    3. Give your folder a name. The naming requirements are as follows:

      • It must be from 2 to 63 characters long.
      • It can only contain lowercase Latin letters, numbers, and hyphens.
      • It must start with a letter and cannot end with a hyphen.
    4. Optionally, specify the description for your folder.

    5. Select Create a default network. This will create a network with subnets in each availability zone. Within this network, you will also have a default security group, within which all network traffic will be allowed.

    6. Click Create.

  3. To link a service account to an Apache Hive™ Metastore cluster, assign the iam.serviceAccounts.user role or higher to your Yandex Cloud account.

    Note

    If you are unable to manage roles, contact your cloud or organization administrator.

  4. Set up a NAT gateway in the subnet to host Apache Hive™ Metastore and Yandex Data Processing clusters.

  5. Create a security group for Apache Hive™ Metastore and Yandex Data Processing clusters.

  6. Add Apache Hive™ Metastore cluster rules to the security group:

    • For incoming client traffic:

      • Port range: 30000-32767
      • Protocol: Any (Any)
      • Source: CIDR
      • CIDR blocks: 0.0.0.0/0
    • For incoming load balancer traffic:

      • Port range: 10256
      • Protocol: Any (Any)
      • Source: Load balancer healthchecks
  7. Add Yandex Data Processing cluster rules to the security group:

    • One inbound and one outbound rule for service traffic:

      • Port range: 0-65535
      • Protocol: Any
      • Source/Destination name: Security group
      • Security group: Current
    • A separate rule for outgoing HTTPS traffic to all addresses. This will allow you to use Yandex Object Storage buckets, UI Proxy, and autoscaling of Yandex Data Processing subclusters.

      • Port range: 443
      • Protocol: TCP
      • Destination name: CIDR
      • CIDR blocks: 0.0.0.0/0
    • Rule that allows access to NTP servers for time syncing:

      • Port range: 123
      • Protocol: UDP
      • Destination name: CIDR
      • CIDR blocks: 0.0.0.0/0
  8. Create a service account with the dataproc.agent, dataproc.provisioner, and managed-metastore.integrationProvider roles.

  9. Create an Object Storage bucket to interact with a Yandex Data Processing cluster.

  10. In the network you created earlier, create a Yandex Data Processing cluster. In the settings, specify:

    • SPARK and YARN services.
    • Service account you created earlier.
    • spark:spark.sql.hive.metastore.sharedPrefixes property with the com.amazonaws,ru.yandex.cloud value. It is required for PySpark jobs and integration with Apache Hive™ Metastore.
    • Bucket you created earlier.
    • Security group you configured earlier.

Create a Apache Hive™ Metastore clusterCreate a Apache Hive™ Metastore cluster

Management console
  1. In the management console, go to the folder you created earlier.
  2. Select Yandex MetaData Hub.
  3. In the left-hand panel, select Metastore.
  4. Click Create cluster.
  5. Enter a name for the cluster. It must be unique within the folder.
  6. Select a service account under which the Apache Hive™ Metastore cluster will interact with other Yandex Cloud services, or create a new one.
  7. Under Network settings, select the network and subnet you created earlier. Specify the security group you configured previously.
  8. Optionally, under Logging, enable logging, select the minimum logging level, and specify the folder or log group.
  9. If required, enable protection of the cluster from accidental deletion by a user.
  10. Click Create.

Connect the Apache Hive™ Metastore cluster to the Yandex Data Processing clusterConnect the Apache Hive™ Metastore cluster to the Yandex Data Processing cluster

Management console
  1. In the Yandex Data Processing cluster you created earlier, specify the following property:

    spark:spark.hive.metastore.uris : thrift://<Apache Hive™ Metastore_cluster_IP_address>:9083
    

    To find out the Apache Hive™ Metastore cluster IP address, select Yandex MetaData Hub in the management console and then select Metastore in the left-hand panel. Copy the IP address column value for the cluster in question.

  2. Add the following outgoing traffic rule to the security group:

    • Port range: 9083
    • Protocol: Any (Any)
    • Source: CIDR
    • CIDR blocks: 0.0.0.0/0

What's nextWhat's next

  • Work with tables using Apache Hive™ Metastore.
  • Use Apache Hive™ Metastore to move data between Yandex Data Processing clusters.
  • Store tabular data in Apache Hive™ Metastore when using Apache Airflow™.
  • Export and import Hive metadata in a Apache Hive™ Metastore cluster.

Apache® and Apache Hive™ are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Was the article helpful?

Previous
Cancel
Next
Getting information about clusters
© 2025 Direct Cursus Technology L.L.C.