Storing Apache Airflow™ connections in Yandex Lockbox
When working with Yandex Managed Service for Apache Airflow™, you can use Yandex Lockbox to store artifacts that can be used in DAG files: connections, variables, and configuration data. Yandex Lockbox integrates into Managed Service for Apache Airflow™ via the Yandex Lockbox Secret Backend
Using a directed acyclic graph (DAG), you can load a connection from Yandex Lockbox and run a SELECT 1; SQL query to a database in a Yandex Managed Service for PostgreSQL cluster. Data for connecting to the DB is stored in Yandex Lockbox and automatically inserted into the graph.
Tip
Clusters running Apache Airflow™ older than 3.0 use apache-airflow-providers-postgres 5.13.1 as a default provider. If using a newer provider version, use SQLExecuteQueryOperator instead of PostgresOperator. For more information, see the official documentation
Getting started
-
Create a Managed Service for PostgreSQL cluster with the following parameters:
- DB name:
db1 - Username:
user1 - Password:
user1-password
- DB name:
-
Create a Yandex Object Storage bucket to store the DAG file in.
-
Configure the Managed Service for Apache Airflow™ cluster:
-
Enable Use Lockbox Secret Backend to use Yandex Lockbox secrets to store Apache Airflow™ configuration data, variables, and connection parameters.
-
Under Dependencies, add the
apache-airflow-providers-postgrespip package.Warning
You need to install a pip package for clusters with Apache Airflow™ version 3.0 or higher. This package comes installed by default on clusters with Apache Airflow™ versions below 3.0.
-
Under DAG file storage, select the Object Storage bucket you created earlier. Your DAG file will be fetched from it.
-
-
Issue the
lockbox.payloadViewerrole to your service account.There is no need to assign the
lockbox.payloadViewerrole for the whole folder. It is enough to assign it for a specific Yandex Lockbox secret once you create it. -
Issue the
lockbox.payloadViewerrole to your service account.There is no need to assign the
lockbox.payloadViewerrole for the whole folder. It is enough to assign it for a specific Yandex Lockbox secret once you create it.
Create a Yandex Lockbox secret
For the Apache Airflow™ cluster to work correctly, your Yandex Lockbox secret's name must have this format: airflow/<artifact_type>/<artifact_ID>, where:
<artifact_type>: Decides what data will be stored in the secret. The allowed values are:connections: Connections.variables: Variables.config: Configuration data.
<artifact_ID>: ID to use to access the artifact in Apache Airflow™.
Create a Yandex Lockbox secret with the following parameters:
-
Name:
airflow/connections/pg1. -
Secret type:
Custom. -
Key:
conn. -
Value: Select Text and specify the following contents:
{ "conn_type": "postgres", "host": "<PostgreSQL_cluster_host_FQDN>", "port": 6432, "schema": "db1", "login": "user1", "password": "user1-password" }
The secret will store the data to connect to the database in the Managed Service for PostgreSQL cluster.
For more information on how to get the FQDN of a PostgreSQL cluster host, see PostgreSQL host FQDN.
Prepare the DAG file and run the graph
-
Create a local file named
test_lockbox_connection.pyand paste the following script to it:Apache Airflow™ version below 3.0Apache Airflow™ version 3.0 or higherfrom airflow import DAG from airflow.providers.postgres.operators.postgres import PostgresOperator from datetime import datetime with DAG( dag_id='test_lockbox_connection', start_date=datetime(2024, 4, 19), schedule="@once", catchup=False, ) as dag: check_conn = PostgresOperator( task_id="check_conn", postgres_conn_id='pg1', sql="SELECT 1;", )from airflow import DAG from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator from datetime import datetime with DAG( dag_id='test_lockbox_connection', start_date=datetime(2024, 4, 19), schedule="@once", catchup=False, ) as dag: check_conn = SQLExecuteQueryOperator( task_id="check_conn", conn_id='pg1', sql="SELECT 1;", ) -
Upload the
test_lockbox_connection.pyDAG file to the bucket you created earlier. This will automatically create a graph with the same name in the Apache Airflow™ web interface. -
Make sure a new graph named
test_lockbox_connectionhas appeared in the DAGs section.It may take a few minutes to load a DAG file from the bucket.
-
To run the graph, click
in the line with its name.
Check the result
To check the result in the Apache Airflow™ web interface:
- In the DAGs section, click
test_lockbox_connection. - Go to the Graph section.
- Select check_conn.
- Go to Logs.
- Make sure the logs contain the
Rows affected: 1line. This means the query was successful.
- In the DAGs section, click
test_lockbox_connection. - Go to Tasks.
- Select check_conn.
- Go to Tasks Instances.
- Select the task instance.
- The Logs section will open.
- Make sure the logs contain the
Rows affected: 1line. This means the query was successful.