Creating an Apache Airflow™ cluster
Every Managed Service for Apache Airflow™ cluster consists of a set of Apache Airflow™ components, each of which can be represented in multiple instances. The instances may reside in different availability zones.
Before creating a cluster
- In the folder where you want to create a cluster, create a service account with the
managed-airflow.integrationProvider
role. - Create a Yandex Object Storage bucket to store DAG files.
- Make sure your account has the vpc.user role and the managed-airflow.editor role or higher for creating a cluster.
Create a cluster
-
In the management console
, select the folder where you want to create a cluster. -
Select Managed Service for Apache Airflow™.
-
Click Create a cluster.
-
Under Basic parameters:
- Enter a name for the cluster. The name must be unique within the folder.
- (Optional) Enter a cluster description.
- (Optional) Create labels:
- Click Add label.
- Enter a label in
key: value
format. - Click Enter.
-
Under Access settings:
-
Set a password for the admin user. The password must be not less than 8 characters long and contain at least:
- One uppercase letter
- One lowercase letter
- One digit
- One special character
Note
Save the password locally or memorize it. The service does not show passwords after the registry is created.
-
Select the previously created service account with the
managed-airflow.integrationProvider
role.
-
-
Under Network settings, select:
-
Availability zones for the cluster
-
Cloud network
-
Subnet in each of the selected availability zones
-
Security group for the cluster network traffic
Security group settings do not affect access to the Apache Airflow™ web interface.
-
-
Set the number of instances and resources for the Managed Service for Apache Airflow™ components:
-
Web server
-
Scheduler
-
Workers
Note
If the issue queue is empty, the number of workers will be the minimum value. When issues appear, the number of workers will increase up to the maximum value.
-
(Optional) Triggerer services
-
-
(Optional) Under Dependencies, specify pip and deb package names to install additional libraries and applications in the cluster to run DAG files.
To specify multiples packages, click Add.
If required, you can set version restrictions for the installed packages, for example:
pandas==2.0.2 scikit-learn>=1.0.0 clickhouse-driver~=0.2.0
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages.Warning
To install pip and deb packages from public repositories, specify a network with configured egress NAT under Network settings.
-
Under DAG file storage, specify a name for the previously created bucket that will store DAG files.
-
(Optional) Under Advanced settings, enable cluster deletion protection.
-
(Optional) Under Airflow configuration:
-
Specify Apache Airflow™ additional properties
, e.g., theapi.maximum_page_limit
key with150
for its value.Populate the fields manually or import a configuration from a file (see sample configuration file
). -
Enable the Use Lockbox Secret Backend option to use secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters.
To extract the required information from the secret, the cluster service account must have the
lockbox.payloadViewer
role.You can assign this role either at whole folder level or individual secret level.
-
-
Click Create.