Connecting to Yandex Managed Service for PostgreSQL
You can use a directed acyclic graph (DAG) to configure a connection to a database in a Yandex Managed Service for PostgreSQL cluster. Data for connecting to the DB is stored in Yandex Lockbox and automatically inserted into the graph.
Tip
Clusters running Apache Airflow™ older than 3.0 use apache-airflow-providers-postgres 5.13.1 as a default provider. If using a newer provider version, use SQLExecuteQueryOperator instead of PostgresOperator. For more information, see the official documentation
Getting started
-
Create a Managed Service for PostgreSQL cluster with the following parameters:
- DB name:
db1 - Username:
user1 - Password:
user1-password
- DB name:
-
Create a Yandex Object Storage bucket to store the DAG file in.
-
Configure the Managed Service for Apache Airflow™ cluster:
-
Enable Use Lockbox Secret Backend to use Yandex Lockbox secrets to store Apache Airflow™ configuration data, variables, and connection parameters.
-
Under Dependencies, add the
apache-airflow-providers-postgrespip package.Warning
You need to install a pip package for clusters with Apache Airflow™ version 3.0 or higher. This package comes installed by default on clusters with Apache Airflow™ versions below 3.0.
-
Under DAG file storage, select the Object Storage bucket you created earlier. Your DAG file will be fetched from it.
-
-
Issue the
lockbox.payloadViewerrole to your service account.There is no need to assign the
lockbox.payloadViewerrole for the whole folder. It is enough to assign it for a specific Yandex Lockbox secret once you create it.
Create a Yandex Lockbox secret
For the Apache Airflow™ cluster to work correctly, your Yandex Lockbox secret's name must have this format: airflow/<artifact_type>/<artifact_ID>, where:
<artifact_type>: Decides what data will be stored in the secret. The allowed values are:connections: Connections.variables: Variables.config: Configuration data.
<artifact_ID>: ID to use to access the artifact in Apache Airflow™.
Create a Yandex Lockbox secret with the following parameters:
-
Name:
airflow/connections/pg1. -
Secret type:
Custom. -
Key:
conn. -
Value: Select Text and specify the following contents:
{ "conn_type": "postgres", "host": "<PostgreSQL_cluster_host_FQDN>", "port": 6432, "schema": "db1", "login": "user1", "password": "user1-password" }
For more information on how to get the FQDN of a PostgreSQL cluster host, see PostgreSQL host FQDN.
The secret will store the data to connect to the database in the Managed Service for PostgreSQL cluster.
Prepare the DAG file and run the graph
-
Create a local file named
postgres_operator.pyand paste the following script to it:Apache Airflow™ version below 3.0Apache Airflow™ version 3.0 or higherfrom airflow import DAG from airflow.providers.postgres.operators.postgres import PostgresOperator with DAG( dag_id='postgres_operator', schedule=None ): PostgresOperator( task_id="check_conn", postgres_conn_id='pg1', sql="SELECT 1;" )from airflow import DAG from airflow.providers.common.sql.operators.sql import SQLExecuteQueryOperator with DAG( dag_id='postgres_operator', schedule=None ): SQLExecuteQueryOperator( task_id="check_conn", conn_id='pg1', sql="SELECT 1;" ) -
Upload the
postgres_operator.pyDAG file to the bucket you created earlier. -
Make sure a new graph named
postgres_operatorhas appeared in the DAGs section.It may take a few minutes to load a DAG file from the bucket.
-
To run the graph, click
in the line with its name.
Check the result
To check the result in the Apache Airflow™ web interface:
- In the DAGs section, click
postgres_operator. - Go to the Graph section.
- Select check_conn.
- Go to Logs.
- Make sure the logs contain the
Rows affected: 1line. This means the query was successful.
- In the DAGs section, click
postgres_operator. - Go to Tasks.
- Select check_conn.
- Go to Tasks Instances.
- Select the task instance.
- The Logs section will open.
- Make sure the logs contain the
Rows affected: 1line. This means the query was successful.