Updating an Apache Airflow™ cluster
After creating a cluster, you can change its basic and advanced settings.
-
Go to the folder page
and select Managed Service for Apache Airflow™. -
Select the cluster and click Edit in the top panel.
-
Under Basic parameters, edit the cluster name and description, delete labels, or add new ones.
-
Under Access settings, select a service account or create a new one with the
managed-airflow.integrationProvider
role. Thus the cluster will get the permissions required for working with user resources. For more information, see Impersonation. -
Under Network settings, select a security group for cluster network traffic or create a new group.
Security group settings do not affect access to the Apache Airflow™ web interface.
-
Under the settings of Managed Service for Apache Airflow™ components, such as Web server configuration, Scheduler configuration, and Worker configuration, specify the number of instances and resources.
-
Under Triggerer configuration, enable or disable the
Triggerer
service. If it is enabled, specify the number of instances and resources. -
Under Dependencies, delete or add names of pip and deb packages.
-
Under DAG file storage, select an existing bucket to store DAG files or create a new one. Make sure to grant permission for
READ
to this bucket. -
Under Advanced settings, enable or disable deletion protection.
-
Under Airflow configuration:
-
Add, edit, or delete Apache Airflow™ additional properties
, e.g., theapi.maximum_page_limit
key with150
for its value.Populate the fields manually or import a configuration from a file (see sample configuration file
). -
Enable or disable the Use Lockbox Secret Backend option allowing you to use secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters.
To extract the required information from the secret, the cluster service account must have the
lockbox.payloadViewer
role.You can assign this role either at whole folder level or individual secret level.
-
-
Under Logging, enable or disable log writing. If logging is enabled, specify the log group to write logs to and the minimum logging level. Logs generated by Apache Airflow™ will be sent to Yandex Cloud Logging.
-
Click Save changes.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To change the cluster settings:
-
View the description of the update cluster CLI command:
yc managed-airflow cluster update --help
-
Provide a list of settings to update in the update cluster command:
yc managed-airflow managed-airflow cluster update <cluster_name_or_ID> \ --new-name <new_cluster_name> \ --description <cluster_description> \ --labels <label_list> \ --service-account-id <service_account_ID> \ --security-group-ids <security_group_IDs> \ --webserver count=<number_of_instances>,` `resource-preset-id=<resource_ID> \ --scheduler count=<number_of_instances>,` `resource-preset-id=<resource_ID> \ --worker min-count=<minimum_number_of_instances>,` `max-count=<maximum_number_of_instances>,` `resource-preset-id=<resource_ID> \ --triggerer count=<number_of_instances>,` `resource-preset-id=<resource_ID> \ --deb-packages <list_of_deb_packages> \ --pip-packages <list_of_pip_packages> \ --dags-bucket <bucket_name> \ --deletion-protection \ --lockbox-secrets-backend \ --log-enabled \ --log-folder-id <folder_ID> \ --log-min-level <logging_level>
Where:
-
--name
: Cluster name. -
--description
: Cluster description. -
--labels
: List of labels. Provide labels in<key>=<value>
format. -
--admin-password
: Admin user password. The password must be not less than 8 characters long and contain at least:- One uppercase letter
- One lowercase letter
- One digit
- One special character
-
--service-account-id
: Service account ID. -
--subnet-ids
: List of subnet IDs. -
--security-group-ids
: List of security group IDs. -
--webserver
,--scheduler
,--worker
,--triggerer
: Managed Service for Apache Airflow™ component configuration:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
min-count
,max-count
: Minimum and maximum number of instances in the cluster for the worker. -
resource-preset-id
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
: 1 vCPU, 4 GB RAM.c2-m8
: 2 vCPUs, 8 GB RAM.c4-m16
: 4 vCPUs, 16 GB RAM.c8-m32
: 8 vCPUs, 32 GB RAM.
-
-
--deb-packages
,--pip-packages
: Lists of deb and pip packages enabling you to install additional libraries and applications in the cluster for running DAG files:If required, you can set version restrictions for the installed packages, for example:
--pip-packages "pandas==2.0.2,scikit-learn>=1.0.0,clickhouse-driver~=0.2.0"
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
--dags-bucket
: Name of the bucket to store DAG files in. -
--deletion-protection
: Enables cluster protection against accidental deletion.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
--lockbox-secrets-backend
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. -
--airflow-config
: Apache Airflow™ additional properties . Provide them in<configuration_section>.<key>=<value>
format, such as the following:--airflow-config core.load_examples=False
-
Logging parameters:
--log-enabled
: Enables logging. Logs generated by Apache Airflow™ will be sent to Yandex Cloud Logging.--log-folder-id
: Folder ID. Logs will be written to the default log group for this folder.--log-group-id
: Custom log group ID.--log-min-level
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
(default),WARN
,ERROR
, andFATAL
.
You can specify only one of the parameters:
--log-folder-id
or--log-group-id
.
You can request the cluster ID and name with a list of clusters in the folder.
-
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about creating this file, see Creating clusters.
-
To change cluster settings, update the values of the required fields in the configuration file.
Alert
Do not change the cluster name and password using Terraform. This will delete the existing cluster and create a new one.
Here is an example of the configuration file structure:
resource "yandex_airflow_cluster" "<cluster_name>" { name = "<cluster_name>" description = "<cluster_description>" labels = { <label_list> } admin_password = "<administrator_password>" service_account_id = "<service_account_ID>" subnet_ids = ["<list_of_subnet_IDs>"] security_group_ids = ["<list_of_security_group_IDs>"] webserver = { count = <number_of_instances> resource_preset_id = "<resource_ID>" } scheduler = { count = <number_of_instances> resource_preset_id = "<resource_ID>" } worker = { min_count = <minimum_number_of_instances> max_count = <maximum_number_of_instances> resource_preset_id = "<resource_ID>" } triggerer = { count = <number_of_instances> resource_preset_id = "<resource_ID>" } pip_packages = ["list_of_pip_packages"] deb_packages = ["list_of_deb_packages"] code_sync = { s3 = { bucket = "<bucket_name>" } } deletion_protection = <deletion_protection> lockbox_secrets_backend = { enabled = <usage_of_secrets> } airflow_config = { <configuration_section> = { <key> = "<value>" } } logging = { enabled = <use_of_logging> folder_id = "<folder_ID>" min_level = "<logging_level>" } } resource "yandex_vpc_network" "<network_name>" { name = "<network_name>" } resource "yandex_vpc_subnet" "<subnet_name>" { name = "<subnet_name>" zone = "<availability_zone>" network_id = "<network_ID>" v4_cidr_blocks = ["<range>"] }
Where:
-
name
: Cluster name. -
description
: Cluster description. -
labels
: List of labels. Provide labels in<key> = "<value>"
format. -
admin_password
: Admin user password. The password must be not less than 8 characters long and contain at least:- One uppercase letter
- One lowercase letter
- One digit
- One special character
-
service_account_id
: Service account ID. -
subnet_ids
: List of subnet IDs. -
security_group_ids
: List of security group IDs. -
webserver
,scheduler
,worker
,triggerer
: Managed Service for Apache Airflow™ component configuration:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
min_count
,max_count
: Minimum and maximum number of instances in the cluster for the worker. -
resource_preset_id
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
: 1 vCPU, 4 GB RAM.c2-m8
: 2 vCPUs, 8 GB RAM.c4-m16
: 4 vCPUs, 16 GB RAM.c8-m32
: 8 vCPUs, 32 GB RAM.
-
-
deb_packages
,pip_packages
: Lists of deb and pip packages enabling you to install additional libraries and applications in the cluster for running DAG files.If required, you can set version restrictions for the installed packages, for example:
pip_packages = ["pandas==2.0.2","scikit-learn>=1.0.0","clickhouse-driver~=0.2.0"]
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
code_sync.s3.bucket
: Name of the bucket to store DAG files in. -
deletion_protection
: Enables cluster protection against accidental deletion. Possible values:true
orfalse
.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
lockbox_secrets_backend.enabled
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. Possible values:true
orfalse
. -
airflow_config
: Apache Airflow™ additional properties , e.g.,core
for configuration section,load_examples
for key, andFalse
for value. -
logging
: Logging parameters:enabled
: Enables logging. Logs generated by Apache Airflow™ components will be sent to Yandex Cloud Logging. Possible values:true
orfalse
.folder_id
: Folder ID. Logs will be written to the default log group for this folder.log_group_id
: Custom log group ID.min_level
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
(default),WARN
,ERROR
, andFATAL
.
You can specify only one of the parameters:
folder_id
orlog_group_id
.
-
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Create a file named
body.json
and add the following contents to it:{ "updateMask": "<list_of_parameters_to_change>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { <label_list> }, "configSpec": { "airflow": { "config": { <list_of_properties> } }, "webserver": { "count": "<number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "scheduler": { "count": "<number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "triggerer": { "count": "<number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "worker": { "minCount": "<minimum_number_of_instances>", "maxCount": "<maximum_number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "dependencies": { "pipPackages": [ <list_of_pip_packages> ], "debPackages": [ <list_of_deb_packages> ] }, "lockbox": { "enabled": <logging_usage> } }, "codeSync": { "s3": { "bucket": "<bucket_name>" } }, "networkSpec": { "securityGroupIds": [ <list_of_security_group_IDs> ] }, "deletionProtection": <deletion_protection>, "serviceAccountId": "<service_account_ID>", "logging": { "enabled": <logging_usage>, "minLevel": "<logging_level>", // Specify either `folderId` or `logGroupId`. "folderId": "<folder_ID>", "logGroupId": "<log_group_ID>", } }
Where:
-
updateMask
: List of parameters to update as a single string, separated by commas.Warning
When updating a cluster, all parameters of the object being changed that were not explicitly set in the request will be overridden with their default values. To avoid this, list the settings you want to change in the
updateMask
parameter. -
name
: Cluster name. -
description
: Cluster description. -
labels
: List of labels. Provide labels in"<key>": "<value>"
format. -
config
: Cluster configuration:-
airflow.config
: Apache Airflow™ additional properties . Provide them in"<configuration_section>.<key>": "<value>"
format, such as the following:"airflow": { "config": { "core.load_examples": "False" } }
-
webserver
,scheduler
,triggerer
, andworker
: Configuration of Managed Service for Apache Airflow™ components:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
minCount
andmaxCount
: Minimum and maximum number of instances in the cluster for the worker. -
resources.resourcePresetId
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
for 1 vCPU and 4 GB RAMc2-m8
for 2 vCPUs and 8 GB RAMc4-m16
for 4 vCPUs and 16 GB RAMc8-m32
for 8 vCPUs and 32 GB RAM
-
-
dependencies
: Lists of packages enabling you to install additional libraries and applications for running DAG files in the cluster:pipPackages
: List of pip packages.debPackages
: List of deb packages.
If required, you can set version restrictions for the installed packages, for example:
"dependencies": { "pipPackages": [ "pandas==2.0.2", "scikit-learn>=1.0.0", "clickhouse-driver~=0.2.0" ] }
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
lockbox.enabled
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. The possible values aretrue
orfalse
.
-
-
network.securityGroupIds
: List of security group IDs. -
codeSync.s3.bucket
: Name of the bucket to store DAG files in. -
deletionProtection
: Enables cluster protection against accidental deletion. The possible values aretrue
orfalse
.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
serviceAccountId
: ID of the previously created service account. -
logging
: Logging parameters:enabled
: Enables logging. Logs generated by Apache Airflow™ components will be sent to Yandex Cloud Logging. The possible values aretrue
orfalse
.minLevel
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
,WARN
,ERROR
, andFATAL
.folderId
: Folder ID. Logs will be written to the default log group for this folder.logGroupId
: Custom log group ID.
You can only specify either
folderId
orlogGroupId
.
-
-
Use the Cluster.update method and make a request, e.g., via cURL
:curl \ --request PATCH \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://airflow.api.cloud.yandex.net/managed-airflow/v1/clusters/<cluster_ID>' --data '@body.json'
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Create a file named
body.json
and add the following contents to it:{ "cluster_id": "<cluster_ID>", "update_mask": "<list_of_parameters_to_change>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { <label_list> }, "config_spec": { "airflow": { "config": { <list_of_properties> } }, "webserver": { "count": "<number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "scheduler": { "count": "<number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "triggerer": { "count": "<number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "worker": { "min_count": "<minimum_number_of_instances>", "max_count": "<maximum_number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "dependencies": { "pip_packages": [ <list_of_pip_packages> ], "deb_packages": [ <list_of_deb_packages> ] }, "lockbox": { "enabled": <logging_usage> } }, "code_sync": { "s3": { "bucket": "<bucket_name>" } }, "network_spec": { "security_group_ids": [ <list_of_security_group_IDs> ] }, "deletion_protection": <deletion_protection>, "service_account_id": "<service_account_ID>", "logging": { "enabled": <logging_usage>, "min_level": "<logging_level>", // Specify either `folderId` or `logGroupId`. "folder_id": "<folder_ID>", "log_group_id": "<log_group_ID>", } }
Where:
-
cluster_id
: Cluster ID. You can retrieve it with a list of clusters in a folder. -
update_mask
: List of parameters to update as an array ofpaths[]
strings.Format for listing settings
"update_mask": { "paths": [ "<setting_1>", "<setting_2>", ... "<setting_N>" ] }
Warning
When updating a cluster, all parameters of the object being changed that were not explicitly set in the request will be overridden with their default values. To avoid this, list the settings you want to change in the
update_mask
parameter. -
name
: Cluster name. -
description
: Cluster description. -
labels
: List of labels. Provide labels in"<key>": "<value>"
format. -
config_spec
: Cluster configuration:-
airflow.config
: Apache Airflow™ additional properties . Provide them in"<configuration_section>.<key>": "<value>"
format, such as the following:"airflow": { "config": { "core.load_examples": "False" } }
-
webserver
,scheduler
,triggerer
, andworker
: Configuration of Managed Service for Apache Airflow™ components:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
min_count
andmax_count
: Minimum and maximum number of instances in the cluster for the worker. -
resources.resource_preset_id
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
for 1 vCPU and 4 GB RAMc2-m8
for 2 vCPUs and 8 GB RAMc4-m16
for 4 vCPUs and 16 GB RAMc8-m32
for 8 vCPUs and 32 GB RAM
-
-
dependencies
: Lists of packages enabling you to install additional libraries and applications for running DAG files in the cluster:pip_packages
: List of pip packages.deb_packages
: List of deb packages.
If required, you can set version restrictions for the installed packages, for example:
"dependencies": { "pip_packages": [ "pandas==2.0.2", "scikit-learn>=1.0.0", "clickhouse-driver~=0.2.0" ] }
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
lockbox.enabled
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. The possible values aretrue
orfalse
.
-
-
network_spec.security_group_ids
: List of security group IDs. -
code_sync.s3.bucket
: Name of the bucket to store DAG files in. -
deletion_protection
: Enables cluster protection against accidental deletion. The possible values aretrue
orfalse
.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
service_account_id
: ID of the previously created service account. -
logging
: Logging parameters:enabled
: Enables logging. Logs generated by Apache Airflow™ components will be sent to Yandex Cloud Logging. The possible values aretrue
orfalse
.min_level
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
,WARN
,ERROR
, andFATAL
.folder_id
: Folder ID. Logs will be written to the default log group for this folder.log_group_id
: Custom log group ID.
You can only specify either
folder_id
orlog_group_id
.
-
-
Use the ClusterService/Update call and make a request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/airflow/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d @ \ airflow.api.cloud.yandex.net:443 \ yandex.cloud.airflow.v1.ClusterService.Update \ < body.json
-
View the server response to make sure the request was successful.