Creating an Apache Airflow™ cluster
Every Managed Service for Apache Airflow™ cluster consists of a set of Apache Airflow™ components, each of which can be represented in multiple instances. The instances may reside in different availability zones.
Creating a cluster
To create a Managed Service for Apache Airflow™ cluster, you need the vpc.user role and the managed-airflow.editor role or higher. For more information on assigning roles, see the Identity and Access Management documentation.
-
In the management console
, select the folder where you want to create a cluster. -
Select Managed Service for Apache Airflow™.
-
Click Create a cluster.
-
Under Basic parameters:
- Enter a name for the cluster. The name must be unique within the folder.
- (Optional) Enter a cluster description.
- (Optional) Create labels:
- Click Add label.
- Enter a label in
key: value
format. - Click Enter.
-
Under Access settings:
-
Set a password for the admin user. The password must be not less than 8 characters long and contain at least:
- One uppercase letter
- One lowercase letter
- One digit
- One special character
Note
Save the password locally or memorize it. The service does not show passwords after the registry is created.
-
Select an existing service account or create a new one.
Make sure to assign the
managed-airflow.integrationProvider
role to the service account:
-
-
Under Network settings, select:
-
Availability zones for the cluster
-
Cloud network
-
Subnet in each of the selected availability zones
Yandex Cloud manages Managed Service for Apache Airflow™ cluster components in the auxiliary subnet. Make sure the IP address range of the subnets you selected does not overlap with the
10.248.0.0/13
auxiliary subnet address range. Otherwise, you will get an error when creating a cluster. -
Security group for the cluster network traffic
Security group settings do not affect access to the Apache Airflow™ web interface.
-
-
Set the number of instances and resources for the Managed Service for Apache Airflow™ components:
-
Web server
-
Scheduler
-
Workers
Note
If the issue queue is empty, the number of workers will be the minimum value. When issues appear, the number of workers will increase up to the maximum value.
-
(Optional) Triggerer services
-
-
(Optional) Under Dependencies, specify pip and deb package names to install additional libraries and applications in the cluster to run DAG files.
To specify multiples packages, click Add.
If required, you can set version restrictions for the installed packages, for example:
pandas==2.0.2 scikit-learn>=1.0.0 clickhouse-driver~=0.2.0
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages.Warning
To install pip and deb packages from public repositories, specify a network with configured egress NAT under Network settings.
-
Under DAG file storage, select a bucket or create a new one. This bucket will store DAG files.
Make sure to grant the
READ
permission for this bucket to the cluster service account. -
(Optional) Under Advanced settings, enable cluster deletion protection.
-
Optionally, under Airflow configuration:
-
Specify Apache Airflow™ additional properties
additional properties, e.g.,api.maximum_page_limit
as a key and150
as its value.Fill in the fields manually or import the settings from a configuration file (see configuration file example
). -
Enable the Use Lockbox Secret Backend option to use secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters.
To extract the required information from the secret, the cluster service account must have the
lockbox.payloadViewer
role.You can assign this role either at whole folder level or individual secret level.
-
-
Click Create.