Updating an Apache Airflow™ cluster
After creating a cluster, you can change its basic and advanced settings.
To change the cluster settings:
-
Go to the folder page
and select Managed Service for Apache Airflow™. -
Select the cluster and click Edit in the top panel.
-
Under Basic parameters, edit the cluster name and description, delete labels, or add new ones.
-
Under Access settings, select a service account or create a new one with the
managed-airflow.integrationProvider
role. The cluster will thus get the permissions it needs to work with user resources. For more information, see Impersonation.To change your service account in a Managed Service for Apache Airflow™ cluster, make sure your Yandex Cloud account has the iam.serviceAccounts.user role or higher.
Warning
If the cluster already uses a service account to access objects from Object Storage, then changing it to a different service account may make these objects unavailable and interrupt the cluster operation. Before changing the service account settings, make sure that the cluster doesn't use the objects in question.
-
Under Network settings, select a security group for cluster network traffic or create a new group.
Security group settings do not affect access to the Apache Airflow™ web interface.
-
Under the settings of Managed Service for Apache Airflow™ components, such as Web server configuration, Scheduler configuration, and Worker configuration, specify the number of instances and resources.
-
Under Triggerer configuration, enable or disable the
Triggerer
service. If it is enabled, specify the number of instances and resources. -
Under Dependencies, delete or add names of pip and deb packages.
-
Under DAG file storage, select an existing bucket to store DAG files or create a new one. Make sure to grant the
READ
permission for this bucket to the cluster service account. -
Under Advanced settings, enable or disable deletion protection.
-
Under Airflow configuration:
-
Add, edit, or delete Apache Airflow™ additional properties
, e.g., theapi.maximum_page_limit
key with150
for its value.Populate the fields manually or import a configuration from a file (see sample configuration file
). -
Enable or disable the Use Lockbox Secret Backend option allowing you to use secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters.
To extract the required information from the secret, the cluster service account must have the
lockbox.payloadViewer
role.You can assign this role either at whole folder level or individual secret level.
-
-
Under Logging, enable or disable logging. If logging is enabled, specify the log group to write logs to and the minimum logging level. Logs generated by Apache Airflow™ will be sent to Yandex Cloud Logging.
-
Click Save changes.
If you do not have the Yandex Cloud command line interface yet, install and initialize it.
The folder specified in the CLI profile is used by default. You can specify a different folder using the --folder-name
or --folder-id
parameter.
To change the cluster settings:
-
View the description of the update cluster CLI command:
yc managed-airflow cluster update --help
-
Provide a list of settings to update in the update cluster command:
yc managed-airflow cluster update <cluster_name_or_ID> \ --new-name <new_cluster_name> \ --description <cluster_description> \ --labels <label_list> \ --service-account-id <service_account_ID> \ --security-group-ids <security_group_IDs> \ --webserver count=<number_of_instances>,` `resource-preset-id=<resource_ID> \ --scheduler count=<number_of_instances>,` `resource-preset-id=<resource_ID> \ --worker min-count=<minimum_number_of_instances>,` `max-count=<maximum_number_of_instances>,` `resource-preset-id=<resource_ID> \ --triggerer count=<number_of_instances>,` `resource-preset-id=<resource_ID> \ --deb-packages <list_of_deb_packages> \ --pip-packages <list_of_pip_packages> \ --dags-bucket <bucket_name> \ --deletion-protection \ --lockbox-secrets-backend \ --log-enabled \ --log-folder-id <folder_ID> \ --log-min-level <logging_level>
Where:
-
--name
: Cluster name. -
--description
: Cluster description. -
--labels
: List of labels. Provide labels in<key>=<value>
format. -
--admin-password
: Admin user password. The password must be not less than 8 characters long and contain at least:- One uppercase letter
- One lowercase letter
- One digit
- One special character
-
--service-account-id
: Service account ID. -
--subnet-ids
: List of subnet IDs. -
--security-group-ids
: List of security group IDs. -
--webserver
,--scheduler
,--worker
,--triggerer
: Managed Service for Apache Airflow™ component configuration:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
min-count
,max-count
: Minimum and maximum number of instances in the cluster for the worker. -
resource-preset-id
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
: 1 vCPU, 4 GB RAMc2-m8
: 2 vCPUs, 8 GB RAMc4-m16
: 4 vCPUs, 16 GB RAMc8-m32
: 8 vCPUs, 32 GB RAM
-
-
--deb-packages
,--pip-packages
: Lists of deb and pip packages enabling you to install additional libraries and applications in the cluster for running DAG files:If required, you can set version restrictions for the installed packages, for example:
--pip-packages "pandas==2.0.2,scikit-learn>=1.0.0,clickhouse-driver~=0.2.0"
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
--dags-bucket
: Name of the bucket to store DAG files in. -
--deletion-protection
: Enables cluster protection against accidental deletion.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
--lockbox-secrets-backend
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. -
--airflow-config
: Apache Airflow™ additional properties . Provide them in<configuration_section>.<key>=<value>
format, such as the following:--airflow-config core.load_examples=False
-
Logging parameters:
-
--log-enabled
: Enables logging. Logs generated by Apache Airflow™ will be sent to Yandex Cloud Logging. -
--log-folder-id
: Folder ID. Logs will be written to the default log group for this folder. -
--log-group-id
: Custom log group ID. Logs will be written to this group.Specify one of the two parameters:
--log-folder-id
or--log-group-id
. -
--log-min-level
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
(default),WARN
,ERROR
, andFATAL
.
You can specify only one of the parameters:
--log-folder-id
or--log-group-id
. -
You can request the cluster ID and name with a list of clusters in the folder.
-
To change the cluster settings:
-
Open the current Terraform configuration file with an infrastructure plan.
For more information about creating this file, see Creating clusters.
-
To change cluster settings, change the required fields' values in the configuration file.
Alert
Do not change the cluster name and password using Terraform. This will delete the existing cluster and create a new one.
Here is an example of the configuration file structure:
resource "yandex_airflow_cluster" "<cluster_name>" { name = "<cluster_name>" description = "<cluster_description>" labels = { <label_list> } admin_password = "<administrator_password>" service_account_id = "<service_account_ID>" subnet_ids = ["<list_of_subnet_IDs>"] security_group_ids = ["<list_of_security_group_IDs>"] webserver = { count = <number_of_instances> resource_preset_id = "<resource_ID>" } scheduler = { count = <number_of_instances> resource_preset_id = "<resource_ID>" } worker = { min_count = <minimum_number_of_instances> max_count = <maximum_number_of_instances> resource_preset_id = "<resource_ID>" } triggerer = { count = <number_of_instances> resource_preset_id = "<resource_ID>" } pip_packages = ["list_of_pip_packages"] deb_packages = ["list_of_deb_packages"] code_sync = { s3 = { bucket = "<bucket_name>" } } deletion_protection = <deletion_protection> lockbox_secrets_backend = { enabled = <usage_of_secrets> } airflow_config = { <configuration_section> = { <key> = "<value>" } } logging = { enabled = <use_of_logging> folder_id = "<folder_ID>" min_level = "<logging_level>" } } resource "yandex_vpc_network" "<network_name>" { name = "<network_name>" } resource "yandex_vpc_subnet" "<subnet_name>" { name = "<subnet_name>" zone = "<availability_zone>" network_id = "<network_ID>" v4_cidr_blocks = ["<range>"] }
Where:
-
name
: Cluster name. -
description
: Cluster description. -
labels
: List of labels. Provide labels in<key> = "<value>"
format. -
admin_password
: Admin user password. The password must be not less than 8 characters long and contain at least:- One uppercase letter
- One lowercase letter
- One digit
- One special character
-
service_account_id
: Service account ID. -
subnet_ids
: List of subnet IDs. -
security_group_ids
: List of security group IDs. -
webserver
,scheduler
,worker
,triggerer
: Managed Service for Apache Airflow™ component configuration:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
min_count
,max_count
: Minimum and maximum number of instances in the cluster for the worker. -
resource_preset_id
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
: 1 vCPU, 4 GB RAMc2-m8
: 2 vCPUs, 8 GB RAMc4-m16
: 4 vCPUs, 16 GB RAMc8-m32
: 8 vCPUs, 32 GB RAM
-
-
deb_packages
,pip_packages
: Lists of deb and pip packages enabling you to install additional libraries and applications in the cluster for running DAG files:If required, you can set version restrictions for the installed packages, for example:
pip_packages = ["pandas==2.0.2","scikit-learn>=1.0.0","clickhouse-driver~=0.2.0"]
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
code_sync.s3.bucket
: Name of the bucket to store DAG files in. -
deletion_protection
: Enables cluster protection against accidental deletion. The possible values aretrue
orfalse
.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
lockbox_secrets_backend.enabled
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. Possible values:true
orfalse
. -
airflow_config
: Apache Airflow™ additional properties , e.g.,core
for configuration section,load_examples
for key, andFalse
for value. -
logging
: Logging parameters:-
enabled
: Enables logging. Logs generated by Apache Airflow™ components will be sent to Yandex Cloud Logging. The possible values aretrue
orfalse
. -
folder_id
: Folder ID. Logs will be written to the default log group for this folder. -
log_group_id
: Custom log group ID. Logs will be written to this group.Specify one of the two parameters:
folder_id
orlog_group_id
. -
min_level
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
(default),WARN
,ERROR
, andFATAL
.
You can specify only one of the parameters:
folder_id
orlog_group_id
. -
-
-
Make sure the settings are correct.
-
Using the command line, navigate to the folder that contains the up-to-date Terraform configuration files with an infrastructure plan.
-
Run the command:
terraform validate
If there are errors in the configuration files, Terraform will point to them.
-
-
Confirm updating the resources.
-
Run the command to view planned changes:
terraform plan
If the resource configuration descriptions are correct, the terminal will display a list of the resources to modify and their parameters. This is a test step. No resources are updated.
-
If you are happy with the planned changes, apply them:
-
Run the command:
terraform apply
-
Confirm the update of resources.
-
Wait for the operation to complete.
-
-
For more information, see the Terraform
To change the cluster settings:
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Create a file named
body.json
and add the following contents to it:{ "updateMask": "<list_of_parameters_to_change>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { <label_list> }, "configSpec": { "airflow": { "config": { <list_of_properties> } }, "webserver": { "count": "<number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "scheduler": { "count": "<number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "triggerer": { "count": "<number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "worker": { "minCount": "<minimum_number_of_instances>", "maxCount": "<maximum_number_of_instances>", "resources": { "resourcePresetId": "<resource_ID>" } }, "dependencies": { "pipPackages": [ <list_of_pip_packages> ], "debPackages": [ <list_of_deb_packages> ] }, "lockbox": { "enabled": <use_of_logging> } }, "codeSync": { "s3": { "bucket": "<bucket_name>" } }, "networkSpec": { "securityGroupIds": [ <list_of_security_group_IDs> ] }, "deletionProtection": <deletion_protection>, "serviceAccountId": "<service_account_ID>", "logging": { "enabled": <use_of_logging>, "minLevel": "<logging_level>", "folderId": "<folder_ID>" } }
Where:
-
updateMask
: List of parameters to update as a single string, separated by commas.Warning
When you update a cluster, all parameters of the object you are changing that were not explicitly provided in the request will be overriden by their defaults. To avoid this, list the settings you want to change in the
updateMask
parameter. -
name
: Cluster name. -
description
: Cluster description. -
labels
: List of labels. Provide labels in"<key>": "<value>"
format. -
config
: Cluster configuration:-
airflow.config
: Apache Airflow™ additional properties . Provide them in"<configuration_section>.<key>": "<value>"
format, for example:"airflow": { "config": { "core.load_examples": "False" } }
-
webserver
,scheduler
,triggerer
, andworker
: Configuration of Managed Service for Apache Airflow™ components:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
minCount
andmaxCount
: Minimum and maximum number of instances in the cluster for the worker. -
resources.resourcePresetId
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
: 1 vCPU, 4 GB RAMc2-m8
: 2 vCPUs, 8 GB RAMc4-m16
: 4 vCPUs, 16 GB RAMc8-m32
: 8 vCPUs, 32 GB RAM
-
-
dependencies
: Lists of packages enabling you to install additional libraries and applications for running DAG files in the cluster:pipPackages
: List of pip packages.debPackages
: List of deb packages.
If required, you can set version restrictions for the installed packages, for example:
"dependencies": { "pipPackages": [ "pandas==2.0.2", "scikit-learn>=1.0.0", "clickhouse-driver~=0.2.0" ] }
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
lockbox.enabled
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. The possible values aretrue
orfalse
.
-
-
network.securityGroupIds
: List of security group IDs. -
codeSync.s3.bucket
: Name of the bucket to store DAG files in. -
deletionProtection
: Enables cluster protection against accidental deletion. The possible values aretrue
orfalse
.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
serviceAccountId
: ID of the service account with themanaged-airflow.integrationProvider
role. The cluster will thus get the permissions it needs to work with user resources. For more information, see Impersonation.To change your service account in a Managed Service for Apache Airflow™ cluster, make sure your Yandex Cloud account has the iam.serviceAccounts.user role or higher.
Warning
If the cluster already uses a service account to access objects from Object Storage, then changing it to a different service account may make these objects unavailable and interrupt the cluster operation. Before changing the service account settings, make sure that the cluster doesn't use the objects in question.
-
logging
: Logging parameters:-
enabled
: Enables logging. Logs generated by Apache Airflow™ components will be sent to Yandex Cloud Logging. The possible values aretrue
orfalse
. -
minLevel
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
,WARN
,ERROR
, andFATAL
. -
folderId
: Folder ID. Logs will be written to the default log group for this folder. -
logGroupId
: Custom log group ID. Logs will be written to this group.Specify either
folderId
orlogGroupId
.
-
-
-
Use the Cluster.update method and make a request, e.g., via cURL
:curl \ --request PATCH \ --header "Authorization: Bearer $IAM_TOKEN" \ --url 'https://airflow.api.cloud.yandex.net/managed-airflow/v1/clusters/<cluster_ID>' --data '@body.json'
You can get the cluster ID with a list of clusters in the folder.
-
View the server response to make sure the request was successful.
To change the cluster settings:
-
Get an IAM token for API authentication and put it into the environment variable:
export IAM_TOKEN="<IAM_token>"
-
Clone the cloudapi
repository:cd ~/ && git clone --depth=1 https://github.com/yandex-cloud/cloudapi
Below, we assume the repository contents are stored in the
~/cloudapi/
directory. -
Create a file named
body.json
and add the following contents to it:{ "cluster_id": "<cluster_ID>", "update_mask": "<list_of_parameters_to_change>", "name": "<cluster_name>", "description": "<cluster_description>", "labels": { <label_list> }, "config_spec": { "airflow": { "config": { <list_of_properties> } }, "webserver": { "count": "<number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "scheduler": { "count": "<number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "triggerer": { "count": "<number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "worker": { "min_count": "<minimum_number_of_instances>", "max_count": "<maximum_number_of_instances>", "resources": { "resource_preset_id": "<resource_ID>" } }, "dependencies": { "pip_packages": [ <list_of_pip_packages> ], "deb_packages": [ <list_of_deb_packages> ] }, "lockbox": { "enabled": <use_of_logging> } }, "code_sync": { "s3": { "bucket": "<bucket_name>" } }, "network_spec": { "security_group_ids": [ <list_of_security_group_IDs> ] }, "deletion_protection": <deletion_protection>, "service_account_id": "<service_account_ID>", "logging": { "enabled": <use_of_logging>, "min_level": "<logging_level>", "folder_id": "<folder_ID>" } }
Where:
-
cluster_id
: Cluster ID. You can retrieve it with a list of clusters in a folder. -
update_mask
: List of parameters to update as an array ofpaths[]
strings.Format for listing settings
"update_mask": { "paths": [ "<setting_1>", "<setting_2>", ... "<setting_N>" ] }
Warning
When you update a cluster, all parameters of the object you are changing that were not explicitly provided in the request will be overriden by their defaults. To avoid this, list the settings you want to change in the
update_mask
parameter. -
name
: Cluster name. -
description
: Cluster description. -
labels
: List of labels. Provide labels in"<key>": "<value>"
format. -
config_spec
: Cluster configuration:-
airflow.config
: Apache Airflow™ additional properties . Provide them in"<configuration_section>.<key>": "<value>"
format, for example:"airflow": { "config": { "core.load_examples": "False" } }
-
webserver
,scheduler
,triggerer
, andworker
: Configuration of Managed Service for Apache Airflow™ components:-
count
: Number of instances in the cluster for the web server, scheduler, and trigger. -
min_count
andmax_count
: Minimum and maximum number of instances in the cluster for the worker. -
resources.resource_preset_id
: ID of the web server, scheduler, worker, and trigger computing resources. The possible values are:c1-m4
: 1 vCPU, 4 GB RAMc2-m8
: 2 vCPUs, 8 GB RAMc4-m16
: 4 vCPUs, 16 GB RAMc8-m32
: 8 vCPUs, 32 GB RAM
-
-
dependencies
: Lists of packages enabling you to install additional libraries and applications for running DAG files in the cluster:pip_packages
: List of pip packages.deb_packages
: List of deb packages.
If required, you can set version restrictions for the installed packages, for example:
"dependencies": { "pip_packages": [ "pandas==2.0.2", "scikit-learn>=1.0.0", "clickhouse-driver~=0.2.0" ] }
The package name format and version are defined by the install command:
pip install
for pip packages andapt install
for deb packages. -
lockbox.enabled
: Enables using secrets in Yandex Lockbox to store Apache Airflow™ configuration data, variables, and connection parameters. The possible values aretrue
orfalse
.
-
-
network_spec.security_group_ids
: List of security group IDs. -
code_sync.s3.bucket
: Name of the bucket to store DAG files in. -
deletion_protection
: Enables cluster protection against accidental deletion. The possible values aretrue
orfalse
.With deletion protection enabled, you will still be able to manually connect to the cluster and delete it.
-
service_account_id
: ID of the service account with themanaged-airflow.integrationProvider
role. The cluster will thus get the permissions it needs to work with user resources. For more information, see Impersonation.To change your service account in a Managed Service for Apache Airflow™ cluster, make sure your Yandex Cloud account has the iam.serviceAccounts.user role or higher.
Warning
If the cluster already uses a service account to access objects from Object Storage, then changing it to a different service account may make these objects unavailable and interrupt the cluster operation. Before changing the service account settings, make sure that the cluster doesn't use the objects in question.
-
logging
: Logging parameters:-
enabled
: Enables logging. Logs generated by Apache Airflow™ components will be sent to Yandex Cloud Logging. The possible values aretrue
orfalse
. -
min_level
: Minimum logging level. Possible values:TRACE
,DEBUG
,INFO
,WARN
,ERROR
, andFATAL
. -
folder_id
: Folder ID. Logs will be written to the default log group for this folder. -
log_group_id
: Custom log group ID. Logs will be written to this group.Specify either
folder_id
orlog_group_id
.
-
-
-
Use the ClusterService/Update call and make a request, e.g., via gRPCurl
:grpcurl \ -format json \ -import-path ~/cloudapi/ \ -import-path ~/cloudapi/third_party/googleapis/ \ -proto ~/cloudapi/yandex/cloud/airflow/v1/cluster_service.proto \ -rpc-header "Authorization: Bearer $IAM_TOKEN" \ -d @ \ airflow.api.cloud.yandex.net:443 \ yandex.cloud.airflow.v1.ClusterService.Update \ < body.json
-
View the server response to make sure the request was successful.