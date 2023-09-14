Yandex Data Proc
A service for processing multi-terabyte data arrays using open source tools like Apache Spark™, Apache Hadoop®, Apache HBase®, Apache Hive™, Apache Zeppelin™, and other Apache® ecosystem services.
Easy-to-use
You select the size of the cluster, node capacity, and a set of services, and Yandex Data Proc automatically creates and configures Spark and Hadoop clusters and other components. Collaborate by using Zeppelin notebooks and other web apps via UI Proxy.
Low costs
Launch DataProc for as little as 18 RUB/hour. Save up to 70% on VMs by choosing preemptible instances.
Full control of your cluster
You get full control of your cluster with root permissions for each VM. Install your own applications and libraries on running clusters without having to restart them.
AutoscalingPreview
Yandex Data Proc uses Instance Groups to automatically increase or decrease computing resources of compute subclusters based on CPU usage indicators.
Managing table metadataPreview
Data Proc allows you to create managed Hive Metastore clusters, which can reduce the probability of failures and losses caused by metadata unavailability.
Task automation
Save time on building ETL pipelines and pipelines for training and developing models, as well as describing other iterative tasks. The Data Proc operator is already built into Apache Airflow.
Implement your projects with Data Proc
Primary data storage and preprocessing
Manage objects' table metadata in Object Storage buckets using Hive Metastore. Prepare and clean up data, create full-fledged repositories and domain-oriented data storefronts.
Analyze user behavior
Analyze events using Hadoop clusters, and use analytics tools to categorize data and identify patterns and trends.
Process data in streaming mode
Process data streams in real time using Apache Spark clusters. Create metrics and save the necessary data slices by integrating Yandex Data Proc with Yandex Object Storage.
We'll take care of most cluster maintenance
Independent control
Control on the Yandex Cloud side
Getting started
Select the necessary computing capacity and Apache® services and create a ready-to-use Data Proc cluster.
FAQ
What Apache® services are available in Yandex Data Proc?
Can anyone access my data?
Apache, Apache Hadoop, Apache Spark, and Apache Oozie are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.