Impersonation in Managed Service for Apache Airflow™
Impersonation in Managed Service for Apache Airflow™ is when an Apache Airflow™ cluster performs actions with user resources on behalf of a service account.
By default, an Apache Airflow™ cluster does not have permissions to access user resources. To provide access to such resources, create a service account with the required roles and link it to the Apache Airflow™ cluster when creating or updating the cluster. After that, you will be able to authenticate on behalf of the service account from the code of DAG files.
Impersonation enables an Apache Airflow™ cluster to support integration with other Yandex Cloud services. These include Yandex Cloud Logging, Yandex Lockbox, and Yandex Monitoring.
For a cluster to be able to work with the services, assign the managed-airflow.integrationProvider
role to its service account. With this role, a cluster can write logs to Cloud Logging, access Yandex Lockbox secrets, and send metrics to Monitoring.
Managed Service for Apache Airflow™ supports integration with other services via the Yandex Cloud Python SDK
Services available for integration
Cloud Logging
Cloud Logging is a service that stores and reads logs for Yandex Cloud services. If a Managed Service for Apache Airflow™ cluster has logging enabled, its logs are saved to a selected Cloud Logging log group.
For more information about logging configuration, see Transferring cluster logs.
Monitoring
Monitoring is a service that collects and stores metrics for Yandex Cloud services. Metrics are displayed as charts on the cluster page under Monitoring. They show the cluster's current state and health and are available by default in all Managed Service for Apache Airflow™ clusters. For a list of available metrics, see the relevant reference.
Yandex Lockbox
Yandex Lockbox is a service for centralized storage of secrets. In Yandex Lockbox, you can store data to use in DAG files, such as configuration data, variables and Apache Airflow™ connection parameters.
By default, Apache Airflow™ stores sensitive data in a metadata storage. In this case, you have to manage secrets for every Apache Airflow™ cluster manually via the UI or API. To automate management of secrets, store them in Yandex Lockbox. For more information, see the Apache Airflow™ documentation
To use Yandex Lockbox features in an Apache Airflow™ cluster, create a secret with the required data and provide the permission to access it to the service account attached to the cluster. After that, you will be able to use data from the secret in the cluster's DAG files.
For an integration example, see Storing Apache Airflow™ connections and variables in Yandex Lockbox.
Tools available for integration with Yandex Cloud
Python SDK
The Yandex Cloud Python SDKyandexcloud.SDK()
object in a DAG file without specifying authentication parameters. The DAG file will be authenticated using the IAM token of the service account attached to the cluster.
For an integration example, see Sending requests to the Yandex Cloud API via the Yandex Cloud Python SDK.
Airflow Yandex Provider
Managed Service for Apache Airflow™ clusters with enabled impersonation have an automatically configured yandexcloud_default
connection used by default by all the Airflow Yandex Provider operators. To use it, create a DAG file without specifying the yandex_conn_id
parameter. The operator will be authenticated using the IAM token of the service account attached to the cluster.
For an integration example, see Automating Yandex Query tasks using Yandex Managed Service for Apache Airflow™.