Storing Apache Airflow™ connections and variables in Yandex Lockbox
When working with Yandex Managed Service for Apache Airflow™, you can use Yandex Lockbox to store artifacts that can be used in DAG files: connections, variables, and configuration data. Yandex Lockbox integrates into Managed Service for Apache Airflow™ via the Yandex Lockbox Secret Backend
Below, we consider a directed acyclic graph (DAG) running the SELECT 1;
SQL query to a database in a Yandex Managed Service for PostgreSQL cluster. Data for connecting to the DB is stored in Yandex Lockbox and automatically substituted into the graph.
To use configuration data from a Yandex Lockbox secret in the graph:
- Prepare your infrastructure.
- Create a Yandex Lockbox secret.
- Prepare the DAG file and run the graph.
- Check the result.
If you no longer need the resources you created, delete them.
Prepare the infrastructure
-
Create a service account named
airflow-sa
with the following roles:managed-airflow.integrationProvider
lockbox.payloadViewer
You do not have to assign the
lockbox.payloadViewer
role for the whole folder. It is enough to assign it for a specific Yandex Lockbox secret once you create it. -
Create an Object Storage bucket in any configuration.
-
Edit the ACL of the new bucket to give the
READ
permission to theairflow-sa
service account. -
Create a Managed Service for Apache Airflow™ cluster with the following parameters:
- Service account:
airflow-sa
- Bucket name: Name of the new bucket
- Use Lockbox Secret Backend: Make sure to enable this option
- Service account:
-
Create a Managed Service for PostgreSQL cluster with the following parameters:
- DB name:
db1
- Username:
user1
- Password:
user1-password
- DB name:
Create a Yandex Lockbox secret
For the Apache Airflow™ cluster to work correctly, your Yandex Lockbox secret's name must have this format: airflow/<artifact_type>/<artifact_ID>
, where:
<artifact_type>
: Type of the artifact that will be stored in the secret. The following types are available:connections
: Connectionsvariables
: Variablesconfig
: Configuration data
<artifact_ID>
: ID that will be used to access the artifact in Apache Airflow™.
Create a Yandex Lockbox secret with the following parameters:
-
Name:
airflow/connections/pg
-
Secret type:
Custom
-
Key:
airflow/connections/pg
-
Value: Select Text and specify the following contents:
{ "conn_type": "postgres", "host": "<PostgreSQL_cluster_host_FQDN>", "port": 6432, "schema": "db1", "login": "user1", "password": "user1-password" }
The secret will store the data to connect to the database in the Managed Service for PostgreSQL cluster.
For more information on how to get the FQDN of a PostgreSQL cluster host, see the documentation.
Prepare the DAG file and run the graph
-
Create a local file named
test_lockbox_connection.py
and copy the following script to it:from airflow import DAG from airflow.providers.postgres.operators.postgres import PostgresOperator from datetime import datetime with DAG( dag_id='test_lockbox_connection', start_date=datetime(2024, 4, 19), schedule="@once", catchup=False, ) as dag: check_conn = PostgresOperator( task_id="check_conn", postgres_conn_id='pg', sql="SELECT 1;", )
-
Upload the
test_lockbox_connection.py
DAG file to the bucket you created earlier. This will automatically create a graph with the same name in the Apache Airflow™ web interface. -
Make sure a new graph named
test_lockbox_connection
has appeared in the DAGs section.It may take a few minutes to upload a DAG file from the bucket.
-
To run the graph, click in the line with its name.
Check the result
To check the result in the Apache Airflow™ web interface:
- In the DAGs section, open the
test_lockbox_connection
graph. - Go to the Graph section.
- Select the check_conn job.
- Go to Logs.
- Make sure the logs contain the
Rows affected: 1
string. This means the query was successful.
Delete the resources you created
Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them: