Working with data in Yandex Managed Service for PostgreSQL
Yandex Query is an interactive service for serverless data analysis. It enables you to process information from different storages without the need to create a dedicated cluster. The service supports working with Yandex Object Storage, Yandex Managed Service for PostgreSQL, and Yandex Managed Service for ClickHouse® data storages.
In this tutorial, you will connect to the Managed Service for PostgreSQL database and run queries against it from the JupyterLab notebook using Query.
- Prepare your infrastructure.
- Get started in Query.
- Create a Managed Service for PostgreSQL cluster.
- Connect to the Managed Service for PostgreSQL data.
If you no longer need the resources you created, delete them.
Getting started
Before getting started, register in Yandex Cloud, set up a community, and link your billing account to it.
- On the DataSphere home page
, click Try for free and select an account to log in with: Yandex ID or your working account in the identity federation (SSO). - Select the Yandex Cloud Organization organization you are going to use in Yandex Cloud.
- Create a community.
- Link your billing account to the DataSphere community you are going to work in. Make sure that you have a billing account linked and its status is
ACTIVE
orTRIAL_ACTIVE
. If you do not have a billing account yet, create one in the DataSphere interface.
Required paid resources
For working with Managed Service for PostgreSQL data, the cost of infrastructure support includes:
- Fee for DataSphere computing resource usage.
- Fee for a running Managed Service for PostgreSQL cluster.
- Fee for the amount of read data when executing Query queries.
Prepare the infrastructure
Log in to the Yandex Cloud management console
If you have an active billing account, you can create or select a folder to deploy your infrastructure in, on the cloud page
Note
If you use an identity federation to access Yandex Cloud, billing details might be unavailable to you. In this case, contact your Yandex Cloud organization administrator.
Create a folder
- In the management console
, select a cloud and click Create folder. - Give your folder a name, e.g.,
data-folder
. - Click Create.
Create a service account for the DataSphere project
- Go to the
data-folder
folder. - In the Service accounts tab, click Create service account.
- Enter a name for the service account, e.g.,
yq-sa
. - Click Add role and assign the following roles to the service account:
datasphere.community-project.editor
: To run DataSphere computations.yq.editor
: To send Query queries.managed-postgresql.viewer
: To view the contents of the Managed Service for PostgreSQL cluster.
- Click Create.
Add the service account to a project
To enable the service account to run a DataSphere project, add it to the list of project members.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - In the Members tab, click Add member.
- Select the
yq-sa
account and click Add.
Create an authorized key for a service account
To allow the service account to send Query queries, create an authorized key.
The yandex_query_magic
package provides magic commands for working in Jupyter. Install it to send queries to Query. Paste the code into the yq-storage.ipynb
notebook cells.
-
Open the DataSphere project:
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Click Open project in JupyterLab and wait for the loading to complete.
- Open the notebook tab.
-
-
Install the
yandex_query_magic
package:%pip install yandex_query_magic
-
Once the installation is complete, from the top panel, select Kernel ⟶ Restart kernel....
-
Upload the extension:
%load_ext yandex_query_magic
-
Configure the connection by specifying the
data-folder
ID and the name of the authorized key secret:%yq_settings --folder-id <folder_ID> --env-auth yq_access_key
-
Run a test query to Query:
%yq select "Hello, world!"
- In the management console
, go todata-folder
. - At the top of the screen, go to the Service accounts tab.
- Select the
yq-sa
service account. - Click Create new key in the top panel and select Create authorized key.
- Select the encryption algorithm and click Create.
- Click Download file with keys.
Create a secret
To get an authorized key from the notebook, create a secret with the contents of the authorized key file.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Under Project resources, click
Secret. - Click Create.
- In the Name field, enter the secret name:
yq_access_key
. - In the Value field, paste the full contents of the downloaded file with the authorized key.
- Click Create.
Create a notebook
Queries to the Managed Service for PostgreSQL database through Query will be sent from the notebook.
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Click Open project in JupyterLab and wait for the loading to complete.
- In the top panel, click File and select New ⟶ Notebook.
- Select a kernel and click Select.
Get started in Query
The yandex_query_magic
package provides magic commands for working in Jupyter. Install it to send queries to Query. Paste the code into the yq-storage.ipynb
notebook cells.
-
Open the DataSphere project:
-
Select the relevant project in your community or on the DataSphere homepage
in the Recent projects tab. - Click Open project in JupyterLab and wait for the loading to complete.
- Open the notebook tab.
-
-
Install the
yandex_query_magic
package:%pip install yandex_query_magic
-
Once the installation is complete, from the top panel, select Kernel ⟶ Restart kernel....
-
Upload the extension:
%load_ext yandex_query_magic
-
Configure the connection by specifying the
data-folder
ID and the name of the authorized key secret:%yq_settings --folder-id <folder_ID> --env-auth yq_access_key
-
Run a test query to Query:
%yq select "Hello, world!"
Create a Managed Service for PostgreSQL cluster
Any running Managed Service for PostgreSQL cluster with the Yandex Query access option enabled is suitable for sending queries.
- In the management console
, selectdata-folder
. - Select Managed Service for PostgreSQL.
- Click Create cluster.
- In the Cluster name field, enter a name for the cluster, e.g.,
postgresql
. - Under Database:
- Specify the DB name, e.g.,
db1
. - Specify Username and Password.
- Specify the DB name, e.g.,
- Under Service settings, enable Yandex Query access and Access from the management console.
- You may leave default values for the other settings.
- Click Create cluster.
Create a table
In this step, you will create a test table with random numbers from 0 to 100.
-
In the management console
, open thepostgresql
cluster page, and go to the SQL tab. -
Enter Username and Password you specified when creating the cluster.
-
In the input window on the right, paste an SQL query:
CREATE TABLE test ( id SERIAL PRIMARY KEY, number INT ); INSERT INTO test (number) SELECT random() * 100 FROM generate_series(1, 100);
-
Click Execute.
Connect to data in Managed Service for PostgreSQL
To create a Query connection:
-
In the management console
, selectdata-folder
. -
In the list of services, select Yandex Query.
-
In the left-hand panel, select Connections.
-
Click
Create new. -
Enter a name for the connection, e.g.,
postgresql
. -
Select the Managed Service for PostgreSQL connection type.
-
Under Connection type parameters:
- Cluster: Select the previously created
postgresql
cluster. - Service account:
yq-sa
. - Database:
db1
. - Enter Login and Password specified when you created the cluster.
- Cluster: Select the previously created
-
Click Create.
To check the connection, get the table data from the notebook cell:
%yq SELECT * FROM postgresql.test
How to delete the resources you created
To stop paying for the resources you created: