Storing Apache Airflow™ connections and variables in Yandex Lockbox
When working with Yandex Managed Service for Apache Airflow™, you can use Yandex Lockbox to store connections, variables, and configuration data used in DAG files. Yandex Lockbox integrates into Managed Service for Apache Airflow™ via the Yandex Lockbox Secret Backend
Below, we consider a directed acyclic graph (DAG) running the SELECT 1;
SQL query to a database in a Yandex Managed Service for PostgreSQL cluster. Data for connecting to the DB is stored in Yandex Lockbox and automatically substituted into the graph.
To use configuration data from a Yandex Lockbox secret in the graph:
- Prepare your infrastructure.
- Create a Yandex Lockbox secret.
- Prepare the DAG file and run the graph.
- Check the result.
If you no longer need the resources you created, delete them.
Prepare the infrastructure
-
Create a service account named
airflow-sa
with the following roles:managed-airflow.integrationProvider
lockbox.payloadViewer
You do not have to assign the
lockbox.payloadViewer
role for the whole folder. It is enough to assign it for a specific Yandex Lockbox secret once you create it. -
Create a static access key for the service account. Save its ID and the secret key.
-
Create an Object Storage bucket in any configuration.
-
Create a Managed Service for Apache Airflow™ cluster with the following parameters:
- Service account:
airflow-sa
- Bucket name: Name of the bucket you created
- Key ID and secret key: Belong to the static access key
- Service account:
-
Create a Managed Service for PostgreSQL cluster with the following parameters:
- DB name:
db1
- Username:
user1
- Password:
user1-password
- DB name:
Create a Yandex Lockbox secret
For the Apache Airflow™ cluster to work correctly, a secret in Yandex Lockbox must have a name in the airflow/<secret_type>/<secret_ID>
format, where:
<secret_type>
: Type of the stored secret. The following types are available:connections
: Connections.variables
: Variables.config
: Configuration data.
<secret_ID>
: ID that will be used to access the Yandex Lockbox secret in Apache Airflow™.
Create a Yandex Lockbox secret with the following parameters:
-
Name:
airflow/connections/pg
-
Value: Specify the following contents:
{ "conn_type": "postgres", "host": "<PostgreSQL_cluster_host_FQDN>", "port": 6432, "schema": "db1", "login": "user1", "password": "user1-password" }
The secret will store the data to connect to the database in the Managed Service for PostgreSQL cluster.
For more information on how to get the FQDN of a PostgreSQL cluster host, see the documentation.
Prepare the DAG file and run the graph
-
Create a local file named
test_lockbox_connection.py
and copy the following script into it:from airflow import DAG from airflow.providers.postgres.operators.postgres import PostgresOperator from datetime import datetime with DAG( dag_id='test_lockbox_connection', start_date=datetime(2024, 4, 19), schedule="@once", catchup=False, ) as dag: check_conn = PostgresOperator( task_id="check_conn", postgres_conn_id='pg', sql="SELECT 1;", )
-
Upload the
test_lockbox_connection.py
DAG file to the bucket you created previously. This will automatically create a graph with the same name in the Apache Airflow™ web interface. -
Make sure you can see the new graph named
test_lockbox_connection
in the DAGs section.It may take a few minutes to upload a DAG file from the bucket.
-
To run the graph, click in the line with its name.
Check the result
To check the result in the Apache Airflow™ web interface:
- In the DAGs section, open the
test_lockbox_connection
graph. - Go to the Graph section.
- Select the check_conn job.
- Go to Logs.
- Make sure there is the
Rows affected: 1
string in the logs. This indicates the query was successful.
Delete the resources you created
Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them: