Using a Yandex Lockbox secret in a PySpark job to connect to Yandex Managed Service for PostgreSQL

Written by

Updated at June 29, 2026

Getting started
- Required paid resources
Set up your infrastructure
Configure access permissions for the Managed Service for PostgreSQL user secret
Prepare a PySpark job
Run the job and check the result
Delete the resources you created

You can use a Yandex Lockbox secret to connect to a Yandex Managed Service for PostgreSQL cluster from a PySpark job in Yandex Managed Service for Apache Spark™. To do this, grant the Yandex Managed Service for Apache Spark™ cluster service account access to the secret. Yandex Connection Manager creates the secret automatically when creating the Managed Service for PostgreSQL user.

For the PySpark job, we use a Python script stored in a Yandex Object Storage bucket. The script retrieves the user password from the secret and uses it to connect to the Managed Service for PostgreSQL cluster.

To get information from the secret and connect to the Managed Service for PostgreSQL cluster:

If you no longer need the resources you created, delete them.

Getting started

Navigate to the management console and log in to Yandex Cloud or create a new account.
On the Yandex Cloud Billing page, make sure you have a billing account linked and it has the ACTIVE or TRIAL_ACTIVE status. If you do not have a billing account, create one and link a cloud to it.

If you have an active billing account, you can create or select a folder for your infrastructure on the cloud page.

Learn more about clouds and folders here.

Required paid resources

Managed Service for Apache Spark™ cluster: computing resources allocated to the hosts (see Managed Service for Apache Spark™ pricing).
Managed Service for PostgreSQL cluster: computing resources allocated to hosts, storage and backup size (see Managed Service for PostgreSQL pricing).
Public IP addresses if public access is enabled for cluster hosts (see Virtual Private Cloud pricing).
Object Storage bucket: use of storage, data operations (see Object Storage pricing).
NAT gateway: hourly use of the gateway and its outgoing traffic (see Virtual Private Cloud pricing).

There is no charge for using Yandex Lockbox secrets created with Connection Manager.

Set up your infrastructure

Create a service account named spark-agent and assign it the following roles:
- managed-spark.integrationProvider: For Yandex Managed Service for Apache Spark™ integration with other services.
- storage.editor: For access to the PySpark job file in the Object Storage bucket.
Create a bucket.
Grant the READ permission to the spark-agent service account for the new bucket.
Create a cloud network named spark-network.

This will automatically create three subnets in different availability zones.
In spark-network, create a security group named spark-sg for the Yandex Managed Service for Apache Spark™ cluster to allow outgoing TCP connections on:
- Port 6432 to connect to PostgreSQL.
- Port 443 to download Maven packages.
In spark-network, create a security group named pg-sg for the Managed Service for PostgreSQL cluster to allow incoming TCP connections on port 6432.
Create a Yandex Managed Service for Apache Spark™ cluster with the following settings:
- Service account: spark-agent
- Network: spark-network
- Security group: spark-sg
Set up a NAT gateway for the subnet where the Yandex Managed Service for Apache Spark™ cluster was created. The NAT gateway is required to download the PostgreSQL JDBC driver from the Maven repository.
Create a Managed Service for PostgreSQL cluster with the following settings:
- Network: spark-network
- Security group: pg-sg

Configure access permissions for the Managed Service for PostgreSQL user secret

Management console

CLI

In the management console, select the folder where you created the infrastructure.
Navigate to Managed Service for PostgreSQL.
Click the name of your cluster and select the Users tab.
Click the connection ID next to the user you need.

This will take you to the Connection Manager connection page for the selected user.
On the page that opens, click the secret ID.

This will open the page of the Yandex Lockbox secret storing the PostgreSQL user password.
On the left-hand panel, select Access bindings and click Assign roles.
Select the spark-agent service account you want to grant secret access to.
Click Add role and select lockbox.payloadViewer.
Click Save.

If you do not have the Yandex Cloud CLI yet, install and initialize it.

The folder used by default is the one specified when creating the CLI profile. To change the default folder, use the yc config set folder-id <folder_ID> command. You can also specify a different folder for any command using --folder-name or --folder-id. If you access a resource by its name, the search will be limited to the default folder. If you access a resource by its ID, the search will be global, i.e., through all folders based on access permissions.

To configure access to the Managed Service for PostgreSQL user secret:

Get the secret ID from the Connection Manager connection:
```
yc metadata-hub connection-manager connection list \
  --mdb-cluster-id <PostgreSQL_cluster_ID>
```
You can get the cluster ID with the list of clusters in the folder.
Configure access to the user secret:
```
yc lockbox secret add-access-binding <secret_ID> \
  --role lockbox.payloadViewer \
  --service-account-name spark-agent 
```
The spark-agent service account will get access to the Managed Service for PostgreSQL user secret.

Prepare a PySpark job

Create a local file named job-mpg-connection-with-secret.py and paste the following script to it:

job-mpg-connection-with-secret.py

import json
import os
import urllib.request
from pyspark.sql import SparkSession

def get_secret(secret_id: str) -> dict:
    """Getting a secret from Yandex Lockbox."""
    # Getting an IAM token via metadata service
    metadata_host = os.environ.get("YC_METADATA_ADDR", "169.254.169.254")
    token_url = f"http://{metadata_host}/computeMetadata/v1/instance/service-accounts/default/token"

    req = urllib.request.Request(token_url)
    req.add_header("Metadata-Flavor", "Google")
    with urllib.request.urlopen(req, timeout=10) as resp:
        token = json.loads(resp.read())["access_token"]

    # Getting a Yandex Lockbox secret
    secret_url = f"https://payload.lockbox.api.cloud.yandex.net/lockbox/v1/secrets/{secret_id}/payload"
    req = urllib.request.Request(secret_url)
    req.add_header("Authorization", f"Bearer {token}")
    with urllib.request.urlopen(req, timeout=30) as resp:
        payload = json.loads(resp.read())
        return {e["key"]: e["textValue"] for e in payload["entries"]}


# Using the secret retrieval function
secrets = get_secret("<secret_ID>")

# Initializing a Spark session
spark = SparkSession.builder.appName("db-reader").getOrCreate()

# Connecting to a Managed Service for PostgreSQL cluster
df = spark.read \
    .format("jdbc") \
    .option("url", f"jdbc:postgresql://<host_FQDN>:6432/<DB_name>") \
    .option("driver", "org.postgresql.Driver") \
    .option("user", "<username>") \
    .option("password", secrets["postgresql_password"]) \
    .option("dbtable", "<table_name>") \
    .load()

# Showing the first 20 rows of the table
df.show()

In the script file, specify the following settings:
- Secret ID.
- Host FQDN to connect to the Managed Service for PostgreSQL cluster.
- Database name.
- Table name.

Run the job and check the result

Management console

CLI

Upload the job-mpg-connection-with-secret.py file to the bucket.
In the Yandex Managed Service for Apache Spark™ cluster, create a job with the following settings:
- Job type: PySpark
- Main python file: s3a://<bucket_name>/job-mpg-connection-with-secret.py
- Packages: org.postgresql:postgresql:42.7.3
Wait for the PySpark job you created to change its status to Done.
Get job execution logs.

The job logs will display rows from the table specified in the script.

Upload the job-mpg-connection-with-secret.py file to the bucket:

yc storage s3api put-object \
  --body <local_file_path> \
  --bucket <bucket_name> \
  --key "job-mpg-connection-with-secret.py"

Run the job:

yc managed-spark job create-pyspark \
  --cluster-id <cluster_ID> \
  --name "<job_name>" \
  --main-python-file-uri s3a://<bucket_name>/job-mpg-connection-with-secret.py \
  --packages org.postgresql:postgresql:42.7.3

You can get the cluster ID with the list of clusters in the folder.

The command will return the job ID. Save it as you will need it to get job execution logs.

Get job execution logs:
```
yc managed-spark job log <job_ID> \
  --cluster-id <cluster_ID>
```
The job logs will display rows from the table specified in the script.

Delete the resources you created

Some resources are not free of charge. Delete the resources you no longer need to avoid paying for them:

Yandex Managed Service for Apache Spark™ cluster.
Managed Service for PostgreSQL cluster.
Object Storage bucket. Before deleting the bucket, delete all its objects.
NAT gateway.

Using a Yandex Lockbox secret in a PySpark job to connect to Yandex Managed Service for PostgreSQL

Getting startedGetting started

Required paid resourcesRequired paid resources

Set up your infrastructureSet up your infrastructure

Configure access permissions for the Managed Service for PostgreSQL user secretConfigure access permissions for the Managed Service for PostgreSQL user secret

Prepare a PySpark jobPrepare a PySpark job

Run the job and check the resultRun the job and check the result

Delete the resources you createdDelete the resources you created

Was the article helpful?