Yandex Cloud
Search
Contact UsGet started
  • Blog
  • Pricing
  • Documentation
  • All Services
  • System Status
    • Featured
    • Infrastructure & Network
    • Data Platform
    • Containers
    • Developer tools
    • Serverless
    • Security
    • Monitoring & Resources
    • ML & AI
    • Business tools
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Customer Stories
    • Gateway to Russia
    • Cloud for Startups
    • Education and Science
  • Blog
  • Pricing
  • Documentation
Yandex project
© 2025 Yandex.Cloud LLC
Yandex Managed Service for Apache Airflow™
  • Getting started
    • All tutorials
      • Yandex Data Processing: Automating operations
      • Yandex Query: Automating jobs
  • Access management
  • Pricing policy
  • Terraform reference
  • Yandex Monitoring metrics
  • Release notes
  • FAQ

In this article:

  • Get your cloud ready
  • Required paid resources
  • Create a service account
  • Create a cloud network and subnets
  • Prepare a bucket in Object Storage
  • Create a bucket
  • Set up the bucket ACL
  • Configure an egress NAT
  • Create a NAT gateway
  • Create a route table
  • Associate the route table with a subnet
  • Create a Managed Service for Apache Airflow™ cluster
  • Prepare the DAG file and run the graph
  • Check the result
  • Delete the resources you created
  1. Tutorials
  2. Operations with clusters
  3. Yandex Query: Automating jobs

Automating Yandex Query tasks using Yandex Managed Service for Apache Airflow™

Written by
Yandex Cloud
Updated at May 7, 2025
  • Get your cloud ready
    • Required paid resources
  • Create a service account
  • Create a cloud network and subnets
  • Prepare a bucket in Object Storage
    • Create a bucket
    • Set up the bucket ACL
  • Configure an egress NAT
    • Create a NAT gateway
    • Create a route table
    • Associate the route table with a subnet
  • Create a Managed Service for Apache Airflow™ cluster
  • Prepare the DAG file and run the graph
  • Check the result
  • Delete the resources you created

Managed Service for Apache Airflow™ is a popular tool for automating data operations. Yandex Query supports integration with Managed Service for Apache Airflow™ using the apache-airflow-providers-yandex pip package.

To create an infrastructure for automation of Yandex Query tasks using Managed Service for Apache Airflow™, follow these steps:

  1. Get your cloud ready.
  2. Create a service account.
  3. Create a cloud network and subnets.
  4. Prepare a bucket in Object Storage.
  5. Configure an egress NAT.
  6. Create a Managed Service for Apache Airflow™ cluster.
  7. Prepare the DAG file and run the graph.
  8. Check the result.

If you no longer need the resources you created, delete them.

Get your cloud readyGet your cloud ready

Sign up in Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or register a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can navigate to the cloud page to create or select a folder for your infrastructure to operate in.

Learn more about clouds and folders.

Required paid resourcesRequired paid resources

The cost of support for the new infrastructure includes:

  • Fee for bucket usage (see Yandex Object Storage pricing).
  • Fee for using the NAT gateway (see Yandex Virtual Private Cloud pricing).
  • Fee for the amount of data read (see Yandex Query pricing).
  • Fee for using the Yandex Managed Service for Apache Airflow™ cluster (see Managed Service for Apache Airflow™ pricing).

Create a service accountCreate a service account

Create a service account named airflow-sa with the editor role for the folder where the Managed Service for Apache Airflow™ cluster will be created:

Management console
Yandex Cloud CLI
API
  1. In the management console, select the folder where you want to create a service account.
  2. In the list of services, select Identity and Access Management.
  3. Click Create service account.
  4. Enter a name for the service account: airflow-sa.
  5. Click Add role and select the editor role.
  6. Click Create.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder specified when creating the CLI profile is used by default. To change the default folder, use the yc config set folder-id <folder_ID> command. You can specify a different folder using the --folder-name or --folder-id parameter.

  1. Create a service account named airflow-sa:

    yc iam service-account create airflow-sa
    

    Result:

    id: nfersamh4sjq********
    folder_id: b1gc1t4cb638********
    created_at: "2023-09-21T10:36:29.726397755Z"
    name: airflow-sa
    

    Save the id of the airflow-sa service account and the folder where you created it (folder_id).

  2. Assign the editor role for the folder to the service account by specifying the folder and service account IDs you previously saved:

    yc resource-manager folder add-access-binding <folder_ID> \
      --role editor \
      --subject serviceAccount:<service_account_ID>
    

To create a service account, use the create REST API method for the ServiceAccount resource or the ServiceAccountService/Create gRPC API call.

To assign the editor role for a folder to a service account, use the setAccessBindings method for the ServiceAccount resource or the ServiceAccountService/SetAccessBindings gRPC API call.

Create a cloud network and subnetsCreate a cloud network and subnets

All resources you create in the tutorial will belong to the same cloud network.

Management console
Yandex Cloud CLI
API
  1. In the management console, select the folder.
  2. In the list of services, select Virtual Private Cloud.
  3. Click Create network.
  4. In the Name field, specify yq-network.
  5. In the Advanced field, select Create subnets.
  6. Click Create network.
  1. Create a network named yq-network:

    yc vpc network create yq-network
    

    Result:

    id: enptrcle5q3d********
    folder_id: b1g9hv2loamq********
    created_at: "2022-04-04T05:25:03Z"
    name: yq-network
    default_security_group_id: enpbsnnop4ak********
    

    For more information about the yc vpc network create command, see the CLI reference.

  2. Create subnets:

    • Availability zone: ru-central1-a

      yc vpc subnet create yq-network-ru-central1-a \
        --zone ru-central1-a \
        --network-name yq-network \
        --range 10.1.0.0/16
      

      Result:

      id: b0c3pte4o2kn********
      folder_id: b1g9hv2loamq********
      created_at: "2022-04-04T09:28:08Z"
      name: yq-network-ru-central1-a
      network_id: enptrcle5q3d********
      zone_id: ru-central1-a
      v4_cidr_blocks:
      - 10.1.0.0/16
      
    • Availability zone: ru-central1-b:

      yc vpc subnet create yq-network-ru-central1-b \
        --zone ru-central1-b \
        --network-name yq-network \
        --range 10.2.0.0/16
      
    • Availability zone: ru-central1-d:

      yc vpc subnet create yq-network-ru-central1-d \
        --zone ru-central1-d \
        --network-name yq-network \
        --range 10.3.0.0/16
      

    For more information about the yc vpc subnet create command, see the CLI reference.

  1. To create a network, use the create REST API method for the Network resource or the NetworkService/Create gRPC API call.

  2. To create subnets, use the create REST API method for the Subnet resource or the SubnetService/Create gRPC API call.

Prepare a bucket in Object StoragePrepare a bucket in Object Storage

Create a bucketCreate a bucket

Management console
AWS CLI
API
  1. In the management console, select the folder.
  2. In the list of services, select Object Storage.
  3. Click Create bucket.
  4. In the ** Name** field, enter a unique name for the bucket, e.g., airflow-bucket.
  5. In the Object read access and Object listing access fields, select Public.
  6. Click Create bucket.

If you do not have the AWS CLI yet, install and configure it.

  1. Create a bucket by specifying a unique name for it:

    aws --endpoint-url https://storage.yandexcloud.net \
      s3 mb s3://<bucket_name>
    

    Result:

    make_bucket: s3://airflow-bucket
    
  2. Enable public access to read objects and their list in the bucket you created:

    aws --endpoint-url https://storage.yandexcloud.net \
      s3api put-bucket-acl \
      --bucket <bucket_name> \
      --acl public-read
    

To create a bucket, use the create REST API method for the Bucket resource, the BucketService/Create gRPC API call, or the create S3 API method.

Set up the bucket ACLSet up the bucket ACL

Edit the ACL of the new bucket to give the READ permission to the airflow-sa service account.

Configure an egress NATConfigure an egress NAT

Create a NAT gatewayCreate a NAT gateway

Management console
Yandex Cloud CLI
API
  1. In the management console, select the folder.
  2. In the list of services, select Virtual Private Cloud.
  3. In the left-hand panel, select Gateways.
  4. Click Create gateway, and in the window that opens:
    1. In the Name field, enter yq-nat.
    2. In the Type field, select Egress NAT.
    3. Click Save.

Create a NAT gateway in the default folder:

yc vpc gateway create \
  --name yq-nat

Result:

id: enpkq1sb7hed********
folder_id: b1g681qpemb4********
created_at: "2024-05-19T13:20:36Z"
name: yq-nat
shared_egress_gateway: {}

For more information about the yc vpc gateway create command, see the CLI reference.

Save the NAT gateway id. You will need it when creating a route table.

To create a NAT gateway, use the create REST API method for the Gateway resource or the GatewayService/Create gRPC API call.

Create a route tableCreate a route table

Management console
Yandex Cloud CLI
API
  1. In the left-hand panel, select Routing tables.
  2. Click Create routing table and specify the route table parameters:
    1. Enter the name: yq-route-table.
    2. Select the network: yq-network.
    3. Click Add.
      • In the Next hop field, select Gateway.
      • In the Gateway field, select the yq-nat NAT gateway. The destination prefix will apply automatically.
    4. Click Add.
  3. Click Create routing table.

Create a route table with the yq-nat NAT gateway as the next hop and the 0.0.0.0/0 destination prefix:

yc vpc route-table create \
  --name=yq-route-table \
  --network-name=yq-network \
  --route destination=0.0.0.0/0,gateway-id=<NAT_gateway_ID>

Result:

id: enp4v8foko6s********
folder_id: b1g681qpemb4********
created_at: "2024-05-19T13:22:47Z"
name: yq-route-table
network_id: enppoggov6ub********
static_routes:
  - destination_prefix: 0.0.0.0/0
    gateway_id: enpkq1sb7hed********

For more information about the yc vpc route-table create command, see the CLI reference.

To create a route table, use the create REST API method for the RouteTable resource or the RouteTableService/Create gRPC API call.

Associate the route table with a subnetAssociate the route table with a subnet

Associate the route table with a subnet to route the subnet traffic via the NAT gateway:

Management console
Yandex Cloud CLI
API
  1. In the left-hand panel, select Subnets.
  2. In the yq-network-ru-central1-a row, click .
  3. Click Link routing table and select yq-route-table.
  4. Click Link.

Run this command:

yc vpc subnet update yq-network-ru-central1-a \
  --route-table-name=yq-route-table

Result:

id: e9b6n3jj3gh6********
folder_id: b1g681qpemb4********
created_at: "2024-05-19T13:24:58Z"
name: yq-network-ru-central1-a
network_id: enppoggov6ub********
zone_id: ru-central1-a
v4_cidr_blocks:
  - 10.1.0.0/16
route_table_id: enp4v8foko6s********
dhcp_options: {}

For more information about the yc vpc subnet update command, see the CLI reference.

To associate a route table with a subnet, use the update REST API method for the Subnet resource or the SubnetService/Update gRPC API call.

Warning

The API method will assign default values to all the parameters of the object you are modifying unless you explicitly provide them in your request. To avoid this, list the settings you want to change in the updateMask parameter as a single comma-separated string.

Create a Managed Service for Apache Airflow™ clusterCreate a Managed Service for Apache Airflow™ cluster

Management console
  1. In the management console, select the folder you want to create a cluster in.

  2. Select Managed Service for Apache Airflow™.

  3. Click Create a cluster.

  4. Under Basic parameters, enter a name for your cluster. The name must be unique within the folder.

  5. Under Access settings, set a password for the admin user. The password must be not less than 8 characters long and contain at least:

    • One uppercase letter
    • One lowercase letter
    • One digit
    • One special character

    Note

    Save the password locally or memorize it. The service does not show passwords after the registry is created.

  6. Under Network settings, select:

    • Availability zone: ru-central1-a

    • Cloud network: yq-network

    • Subnet: yq-network-ru-central1-a

    • Security group: default

      Security group settings do not affect access to the Apache Airflow™ web interface.

  7. Under Dependencies, specify the pip package name and version restriction:

    apache-airflow-providers-yandex>=3.10
    
  8. Under DAG file storage, select the bucket you created earlier.

  9. Click Create.

Prepare the DAG file and run the graphPrepare the DAG file and run the graph

As an example, we use a directed acyclic graph (DAG) with two vertices:

  • yq_operator: Runs a simple query to Yandex Query.
  • output_operator: Outputs the result of the yq_operator vertex execution.

To prepare a DAG:

  1. Create a local file named yq_dag.py and copy the following script into it:

    yq_dag.py
    import datetime
    
    from airflow.models.dag import DAG
    from airflow.providers.yandex.operators.yq import YQExecuteQueryOperator
    from airflow.operators.python_operator import PythonOperator
    
    with DAG(
        dag_id="yq_hello_world_operator",
        schedule_interval="@hourly",
        start_date=datetime.datetime.now(),
    ) as dag:
    
        yq_operator = YQExecuteQueryOperator(
            task_id="yq_operator", sql="SELECT 'Hello, world!'"
        )
    
        def print_context(ds=None, **kwargs):
            ti = kwargs["ti"]
            print(ti.xcom_pull(task_ids="yq_operator"))
    
        output_operator = PythonOperator(
            task_id="output_operator", provide_context=True, python_callable=print_context
        )
    
        yq_operator >> output_operator
    
    if __name__ == "__main__":
        dag.test()
    
  2. Upload the DAG file into the Managed Service for Apache Airflow™ cluster. Create a files/dags folder in the airflow-bucket and upload the yq_dag.py file to it.

  3. Open the Apache Airflow™ web interface.

  4. Make sure the new yq_hello_world_operator DAG file has appeared in the DAGs section.

    Note

    It may take a few minutes to upload a DAG file from the bucket.

  5. To run a DAG, first click image in the line with its name, and then click Trigger DAG.

Check the resultCheck the result

  1. In the DAGs section, open the yq_hello_world_operator graph.
  2. Go to the Grid section.
  3. Select the yq_operator task.
  4. Go to the XCom section.
  5. Make sure the return_value line has the 'rows': [['Hello, world!']] element. This indicates the query was successful.

Delete the resources you createdDelete the resources you created

To delete the infrastructure and stop paying for the resources you created:

  1. Delete the Object Storage bucket.
  2. Disassociate and delete the route table.
  3. Delete the NAT gateway.
  4. Delete the Apache Airflow™ cluster.
  5. Delete the subnets, network, and the service account, if required.

Was the article helpful?

Previous
Yandex Data Processing: Automating operations
Next
Configuring an SMTP server to send e-mail notifications
Yandex project
© 2025 Yandex.Cloud LLC