Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
    • Yandex Cloud Partner program
  • Blog
  • Pricing
  • Documentation
© 2025 Direct Cursus Technology L.L.C.
Yandex Managed Service for Apache Airflow™
  • Getting started
    • Resource relationships
    • Networking in Managed Service for Apache Airflow™
    • Available Apache Airflow™ versions
    • Quotas and limits
    • Impersonation
  • Access management
  • Pricing policy
  • Terraform reference
  • Yandex Monitoring metrics
  • Release notes
  • FAQ

In this article:

  • About Apache Airflow™
  • Managed Service for Apache Airflow™ architecture
  • Apache Airflow™ cluster
  • Apache Airflow™ main components
  • Apache Airflow™ component configurations
  • Triggerer
  1. Concepts
  2. Resource relationships

Resource relationships in Managed Service for Apache Airflow™

Written by
Yandex Cloud
Updated at March 4, 2025
  • About Apache Airflow™
  • Managed Service for Apache Airflow™ architecture
  • Apache Airflow™ cluster
  • Apache Airflow™ main components
    • Apache Airflow™ component configurations
  • Triggerer

Managed Service for Apache Airflow™ helps you deploy and maintain clusters of Apache Airflow™ servers in the Yandex Cloud infrastructure.

About Apache Airflow™About Apache Airflow™

Apache Airflow™ is an open-source platform that enables you to create, schedule, and monitor batch-oriented workflows. A workflow defines job relationships and their execution sequence. It is presented as a directed acyclic graph (DAG). DAGs in Apache Airflow™ can be used for automation and scheduled runs of any processes, e.g., data processing in Apache Spark™.

Apache Airflow™ follows the Workflows as code approach. It implies that each workflow is implemented using a Python 3 script. A file with this script is called a DAG file. It describes jobs, their run schedule, and dependencies between them. This approach allows storing workflows in a version control system, running tests, and enabling technology required for workflows.

Apache Airflow™ is not used for streaming and continuous data processing. If such processing is required, you can develop a solution based on Yandex Managed Service for Apache Kafka®.

For more information, see the Apache Airflow™ documentation.

Managed Service for Apache Airflow™ architectureManaged Service for Apache Airflow™ architecture

The Managed Service for Apache Airflow™ architecture is presented on the diagram:

Each Apache Airflow™ cluster runs in a separate Kubernetes node group with the required network infrastructure. This infrastructure includes a virtual network, a security group, and a service account. Node groups are isolated from each other, both through virtual networks and through Kubernetes itself. Node groups are managed by a common Kubernetes master, and Apache Airflow™ clusters use a common PostgreSQL cluster for data storage.

To ensure isolated data storage, the service limits the use of the PostgreSQL cluster:

  • A separate database is created for each Apache Airflow™ cluster in the PostgreSQL cluster. Clusters can connect only to their own database.

  • Apache Airflow™ clusters can work only with tables created by Apache Airflow™. You cannot create and modify schemas, tables, functions, procedures, and triggers yourself.

  • Read and write speed, as well as the available database storage space, are limited.

    Warning

    Any malicious attempt to bypass these restrictions will result in your cluster being locked under Clause 7 of the Acceptable Use Policy.

Apache Airflow™ clusterApache Airflow™ cluster

The main entity Managed Service for Apache Airflow™ operates is a cluster. Inside a cluster, Apache Airflow™ components are deployed. Cluster resources may reside in different availability zones. You can learn more about Yandex Cloud availability zones in Platform overview.

A workflow running in a cluster may access any Yandex Cloud resource within the cloud network where the cluster is located. For example, a workflow can send requests to Yandex Cloud VMs or managed DB clusters. You can build a workflow using multiple resources, e.g., a workflow that collects data from one DB and sends it to another DB or Yandex Data Processing.

Apache Airflow™ main componentsApache Airflow™ main components

The main Apache Airflow™ components are shown below:

Apache Airflow™ components:

  • Web server: Server in Yandex Cloud hosting an Apache Airflow™ instance. The web server receives user commands sent through the Apache Airflow™ web interface and checks, runs, and debugs Python scripts in DAG files.

    To learn more about working with the web interface, see the Apache Airflow™ documentation.

  • Scheduler: Server in Yandex Cloud that controls the job run schedule. The scheduler gets schedule information from DAG files. It uses this schedule to notify workers that it is time to run a DAG file.

  • Workers: Executors of jobs specified in DAG files. The workers run jobs on the schedule received from the scheduler.

  • Triggerer: Service that releases a worker if it goes idle while executing a job with a long event timeout (optional component).

  • DAG file storage: Yandex Object Storage bucket that stores DAG files. This storage can be accessed by web servers, schedulers, workers, and Triggerer.

To ensure fault tolerance and enhance performance, web servers, schedulers, and Triggerer may exist in multiple instances. Their number is set when creating a cluster.

For workers, you can also set the minimum and maximum number of instances while creating a cluster. Their number will be scaled dynamically. This feature is provided by the KEDA controller.

Apache Airflow™ component configurationsApache Airflow™ component configurations

A configuration decides the computing power allocated for the web server, scheduler, workers, and the Triggerer service. There are two available configuration types:

  • standard: With 4:1 RAM GB to vCPU ratio.

    • 1 vCPU, 4 GB RAM
    • 2 vCPUs, 8 GB RAM
    • 4 vCPUs, 16 GB RAM
    • 8 vCPUs, 32 GB RAM
  • cpu-optimized: With reduced RAM to vCPU ratio (2:1). These configurations may be useful for clusters with higher processor performance requirements.

    • 1 vCPU, 2 GB RAM
    • 2 vCPUs, 4 GB RAM
    • 4 vCPUs, 8 GB RAM
    • 8 vCPUs, 16 GB RAM

You can select configurations when creating a cluster or change them while editing it.

TriggererTriggerer

The Triggerer service reduces worker idle time.

DAGs may contain jobs that send requests to an external system (such as a Apache Spark™ cluster) and wait for it to respond for a certain period of time. If standard operators are used, such a job will occupy a worker while awaiting the response. This keeps the worker idle. If this happens to a large number of workers, job queues will form, reducing the job run speed and slowing down execution.

Deferrable operators help avoid a situation like this. They allow pausing a job, releasing a worker, and isolating the external system request into a separate process called a trigger. All triggers are independent from each other and processed by Triggerer asynchronously, with separate resources allocated for it in the cluster. Once a response is received from the external system, a trigger fires, and the scheduler returns the job to the worker.

See how to work with Triggerer in the figure below:

For more information about deferrable operators, triggers, and the Triggerer service, see the Apache Airflow™ documentation.

Was the article helpful?

Previous
Sending requests to the Yandex Cloud API via the Yandex Cloud Python SDK
Next
Networking in Managed Service for Apache Airflow™
© 2025 Direct Cursus Technology L.L.C.