Yandex Cloud
Search
Contact UsTry it for free
  • Customer Stories
  • Documentation
  • Blog
  • All Services
  • System Status
  • Marketplace
    • Featured
    • Infrastructure & Network
    • Data Platform
    • AI for business
    • Security
    • DevOps tools
    • Serverless
    • Monitoring & Resources
  • All Solutions
    • By industry
    • By use case
    • Economics and Pricing
    • Security
    • Technical Support
    • Start testing with double trial credits
    • Cloud credits to scale your IT product
    • Gateway to Russia
    • Cloud for Startups
    • Center for Technologies and Society
    • Yandex Cloud Partner program
    • Price calculator
    • Pricing plans
  • Customer Stories
  • Documentation
  • Blog
© 2026 Direct Cursus Technology L.L.C.
Yandex Lockbox
  • Getting started
    • All tutorials
    • Syncing with Managed Service for Kubernetes secrets
    • Storing Apache Airflow™ connections and variables in Yandex Lockbox
    • Building a CI/CD pipeline in GitLab with serverless products
    • Secure storage of GitLab CI passwords as Yandex Lockbox secrets
    • Loading data from Yandex Direct to a Yandex Managed Service for ClickHouse® data mart using Yandex Cloud Functions, Yandex Object Storage, and Yandex Data Transfer
    • Deploying a fault-tolerant architecture with preemptible VMs
    • Building an interactive serverless application using WebSocket
    • Automatically copying objects from one Object Storage bucket to another
    • Secure password transmission to an initialization script
    • Using a secret in a PySpark job to connect to Yandex Managed Service for PostgreSQL
  • Quotas and limits
  • Access management
  • Pricing policy
  • Terraform reference
  • Monitoring metrics
  • Audit Trails events
  • Release notes
  • FAQ

In this article:

  • Getting started
  • Required paid resources
  • Set up your infrastructure
  • Configure permissions for access to a Managed Service for PostgreSQL user's secret
  • Prepare a PySpark job
  • Run the job and check the result
  • Delete the resources you created
  1. Tutorials
  2. Using a secret in a PySpark job to connect to Yandex Managed Service for PostgreSQL

Using a Yandex Lockbox secret in a PySpark job to connect to Yandex Managed Service for PostgreSQL

Written by
Yandex Cloud
Updated at March 31, 2026
  • Getting started
    • Required paid resources
  • Set up your infrastructure
  • Configure permissions for access to a Managed Service for PostgreSQL user's secret
  • Prepare a PySpark job
  • Run the job and check the result
  • Delete the resources you created

You can use a Yandex Lockbox secret to connect to a Yandex Managed Service for PostgreSQL cluster from a PySpark job in Yandex Managed Service for Apache Spark™. To do this, grant the Yandex Managed Service for Apache Spark™ cluster service account access to the secret. Yandex Connection Manager creates the secret automatically when creating the Managed Service for PostgreSQL user.

For the PySpark job, we use a Python script stored in a Yandex Object Storage bucket. The script retrieves the user password from the secret and uses it to connect to the Managed Service for PostgreSQL cluster.

To get information from the secret and connect to the Managed Service for PostgreSQL cluster:

  1. Set up your infrastructure.
  2. Configure permissions for access to the user's secret.
  3. Prepare a PySpark job.
  4. Run the job and check the result.

If you no longer need the resources you created, delete them.

Getting startedGetting started

Sign up for Yandex Cloud and create a billing account:

  1. Navigate to the management console and log in to Yandex Cloud or create a new account.
  2. On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page.

Learn more about clouds and folders here.

Required paid resourcesRequired paid resources

  • Yandex Managed Service for Apache Spark™ cluster: computing resources of cluster components (see Yandex Managed Service for Apache Spark™ pricing).
  • Managed Service for PostgreSQL cluster, which includes the use of computing resources allocated to hosts, storage and backup size (see Managed Service for PostgreSQL pricing).
  • Object Storage bucket: use of storage, data operations (see Object Storage pricing).
  • NAT gateway: hourly use of the gateway and its outgoing traffic (see Virtual Private Cloud pricing).
  • Public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).

There is no charge for using Yandex Lockbox secrets created with Connection Manager.

Set up your infrastructureSet up your infrastructure

  1. Create a service account named spark-agent and assign it the following roles:

    • managed-spark.integrationProvider: For Yandex Managed Service for Apache Spark™ integration with other services.
    • storage.editor: For access to the PySpark job file in the Object Storage bucket.
  2. Create a bucket.

  3. Grant the READ permission to the spark-agent service account for the new bucket.

  4. Create a cloud network named spark-network.

    This will automatically create three subnets in different availability zones.

  5. In spark-network, create a security group named spark-sg for the Yandex Managed Service for Apache Spark™ cluster to allow outgoing TCP connections on:

    • Port 6432 to connect to PostgreSQL.
    • Port 443 to download Maven packages.
  6. In spark-network, create a security group named pg-sg for the Managed Service for PostgreSQL cluster to allow incoming TCP connections on port 6432.

  7. Create a Yandex Managed Service for Apache Spark™ cluster with the following settings:

    • Service account: spark-agent
    • Network: spark-network
    • Security group: spark-sg
  8. Set up a NAT gateway for the subnet where the Yandex Managed Service for Apache Spark™ cluster was created. The NAT gateway is required to download the PostgreSQL JDBC driver from the Maven repository.

  9. Create a Managed Service for PostgreSQL cluster with the following settings:

    • Network: spark-network
    • Security group: pg-sg

Configure permissions for access to a Managed Service for PostgreSQL user's secretConfigure permissions for access to a Managed Service for PostgreSQL user's secret

Management console
CLI
  1. In the management console, select the folder where you created the infrastructure.

  2. Go to Managed Service for PostgreSQL.

  3. Click the name of your cluster and select the Users tab.

  4. Click the connection ID next to the user you need.

    This will take you to the Connection Manager connection page for the selected user.

  5. On the page that opens, click the secret ID.

    This will open the page of the Yandex Lockbox secret storing the PostgreSQL user password.

  6. On the left-hand panel, select Access bindings and click Assign roles.

  7. Select the spark-agent service account you want to grant secret access to.

  8. Click Add role and select lockbox.payloadViewer.

  9. Click Save.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also set a different folder for any specific command using the --folder-name or --folder-id options.

To configure access to the Managed Service for PostgreSQL user secret:

  1. Get the secret ID from the Connection Manager connection:

    yc metadata-hub connection-manager connection list \
      --mdb-cluster-id <PostgreSQL_cluster_ID>
    

    You can get the cluster ID with the list of clusters in the folder.

  2. Configure access to the user secret:

    yc lockbox secret add-access-binding <secret_ID> \
      --role lockbox.payloadViewer \
      --service-account-name spark-agent 
    

    The spark-agent service account will get access to the Managed Service for PostgreSQL user secret.

Prepare a PySpark jobPrepare a PySpark job

  1. Create a local file named job-mpg-connection-with-secret.py and paste the following script to it:

    job-mpg-connection-with-secret.py
    import json
    import os
    import urllib.request
    from pyspark.sql import SparkSession
    
    def get_secret(secret_id: str) -> dict:
        """Getting a secret from Yandex Lockbox."""
        # Getting an IAM token via metadata service
        metadata_host = os.environ.get("YC_METADATA_ADDR", "169.254.169.254")
        token_url = f"http://{metadata_host}/computeMetadata/v1/instance/service-accounts/default/token"
    
        req = urllib.request.Request(token_url)
        req.add_header("Metadata-Flavor", "Google")
        with urllib.request.urlopen(req, timeout=10) as resp:
            token = json.loads(resp.read())["access_token"]
    
        # Getting a Yandex Lockbox secret
        secret_url = f"https://payload.lockbox.api.cloud.yandex.net/lockbox/v1/secrets/{secret_id}/payload"
        req = urllib.request.Request(secret_url)
        req.add_header("Authorization", f"Bearer {token}")
        with urllib.request.urlopen(req, timeout=30) as resp:
            payload = json.loads(resp.read())
            return {e["key"]: e["textValue"] for e in payload["entries"]}
    
    
    # Using the secret retrieval function
    secrets = get_secret("<secret_ID>")
    
    # Initializing a Spark session
    spark = SparkSession.builder.appName("db-reader").getOrCreate()
    
    # Connecting to a Managed Service for PostgreSQL cluster
    df = spark.read \
        .format("jdbc") \
        .option("url", f"jdbc:postgresql://<host_FQDN>:6432/<DB_name>") \
        .option("driver", "org.postgresql.Driver") \
        .option("user", "<username>") \
        .option("password", secrets["postgresql_password"]) \
        .option("dbtable", "<table_name>") \
        .load()
    
    # Showing the first 20 rows of the table
    df.show()
    
  2. In the script file, specify the following settings:

    • Secret ID.
    • Host FQDN to connect to the Managed Service for PostgreSQL cluster.
    • Database name.
    • Table name.

Run the job and check the resultRun the job and check the result

Management console
CLI
  1. Upload the job-mpg-connection-with-secret.py file to the bucket.

  2. In the Yandex Managed Service for Apache Spark™ cluster, create a job with the following settings:

    • Job type: PySpark
    • Main python file: s3a://<bucket_name>/job-mpg-connection-with-secret.py
    • Packages: org.postgresql:postgresql:42.7.3
  3. Wait for the PySpark job you created to change its status to Done.

  4. Get job execution logs.

    The job logs will display rows from the table specified in the script.

  1. Upload the job-mpg-connection-with-secret.py file to the bucket:

    yc storage s3api put-object \
      --body <local_file_path> \
      --bucket <bucket_name> \
      --key "job-mpg-connection-with-secret.py"
    
  2. Run the job:

    yc managed-spark job create-pyspark \
      --cluster-id <cluster_ID> \
      --name "<job_name>" \
      --main-python-file-uri s3a://<bucket_name>/job-mpg-connection-with-secret.py \
      --packages org.postgresql:postgresql:42.7.3
    

    You can get the cluster ID with the list of clusters in the folder.

    The command will return the job ID. Save it as you will need it to get job execution logs.

  3. Get job execution logs:

    yc managed-spark job log <job_ID> \
      --cluster-id <cluster_ID>
    

    The job logs will display rows from the table specified in the script.

Delete the resources you createdDelete the resources you created

Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them:

  1. Yandex Managed Service for Apache Spark™ cluster.
  2. Managed Service for PostgreSQL cluster.
  3. Object Storage bucket. Before deleting the bucket, delete all its objects.
  4. NAT gateway.

Was the article helpful?

Previous
Secure password transmission to an initialization script
Next
Quotas and limits
© 2026 Direct Cursus Technology L.L.C.