Yandex Data Processing

A service for processing multi-terabyte data arrays using open source tools like Apache Spark^™, Apache Hadoop^®, Apache HBase^®, Apache Hive^™, Apache Zeppelin^™, and other Apache^® ecosystem services.

Get started Documentation

Easy-to-use

You select the size of the cluster, node capacity, and a set of services, and Yandex Data Processing automatically creates and configures Spark and Hadoop clusters and other components. Collaborate by using Zeppelin notebooks and other web apps via UI Proxy.

Low costs

Launch Yandex Data Processing for as little as 18 RUB/hour. Save up to 70% on VMs by choosing preemptible instances.

Learn more

Full control of your cluster

You get full control of your cluster with root permissions for each VM. Install your own applications and libraries on running clusters without having to restart them.

AutoscalingPreview

Yandex Data Processing uses Instance Groups to automatically increase or decrease computing resources of compute subclusters based on CPU usage indicators.

Learn more

Managing table metadata

Yandex Data Processing allows you to create managed Hive Metastore clusters, which can reduce the probability of failures and losses caused by metadata unavailability.

Task automation

Save time on building ETL pipelines and pipelines for training and developing models, as well as describing other iterative tasks. The Yandex Data Proc operator is already built into Apache Airflow.

Learn more

Implement your projects with Yandex Data Processing

We’ll take care of most cluster maintenance

Processes

Yandex Data Processing

Apache Hadoop self‑installation

Data access control

Cluster creation and updates

Network configurations

OS and software installation

Image version upgrade

Interfaces for running jobs

Automated scaling

Integration with Yandex Cloud services

Monitoring tools

Independent control

Control on the Yandex Cloud side

Getting started

Select the necessary computing capacity and Apache^® services and create a ready-to-use Yandex Data Processing cluster.

Create a cluster

FAQ

Spark^™, HDFS, YARN, Hive, HBase^®, Oozie^™, Sqoop^™, Flume^™, Tez^®, and Zeppelin^™.

Documentation

Get started with Yandex Data Processing

Console Documentation

Useful links

Ask a question

Pricing

Apache, Apache Hadoop, Apache Spark, and Apache Oozie are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

Yandex Data Processing

Easy-to-use

Low costs

Full control of your cluster

AutoscalingPreview

Managing table metadata

Task automation

Implement your projects with Yandex Data Processing

Primary data storage and preprocessing

Analyze user behavior

Process data in streaming mode

Retrieve, transform, and export data

We’ll take care of most cluster maintenance

Getting started

FAQ

What Apache^® services are available in Yandex Data Processing?

Can anyone access my data?

Get started with Yandex Data Processing

Useful links

Easy-to-use

Low costs

Full control of your cluster

AutoscalingPreview

Managing table metadata

Task automation

Implement your projects with Yandex Data Processing

We’ll take care of most cluster maintenance

Getting started

FAQ

What Apache® services are available in Yandex Data Processing?

Can anyone access my data?

Get started with Yandex Data Processing

Useful links

What Apache^® services are available in Yandex Data Processing?