Working with Yandex MPP Analytics for PostgreSQL databases
This section covers the basics of working with Yandex MPP Analytics for PostgreSQL.
To start working with a Yandex MPP Analytics for PostgreSQL database, follow these steps:
- Create a connection containing your database access credentials.
- Run a query against the database.
Query example for reading data from Yandex MPP Analytics for PostgreSQL:
SELECT * FROM greenplum_mdb_connection.my_table
Where:
greenplum_mdb_connection: Your database connection name.my_table: Database table name.
Setting up a connection
To create a connection to Yandex MPP Analytics for PostgreSQL:
-
In the management console
, select the folder where you want to create a connection. -
Navigate to Yandex Query.
-
In the left-hand panel, switch to the Connections tab.
-
Click
Create new. -
Specify the connection parameters:
-
Under General parameters:
- Name: Yandex MPP Analytics for PostgreSQL connection name.
- Type:
Managed Service for Greenplum.
-
Under Connection type parameters:
-
Cluster: Select an existing Yandex MPP Analytics for PostgreSQL cluster or create a new one.
-
Service account: Select an existing Yandex MPP Analytics for PostgreSQL service account or create a new one. Assign it the
managed-greenplum.viewerrole allowing it to connect toYandex MPP Analytics for PostgreSQLclusters.To use a service account, the
iam.serviceAccounts.userrole is required. -
Database: Select the database you will use when working with the Greenplum® cluster.
-
Schema: Specify the namespace
you will use when working with the Greenplum® database. -
Login: Username you will use to connect to Greenplum® databases.
-
Password: Password you will use to connect to Greenplum® databases.
-
-
-
Click Create.
A service account is necessary to detect Yandex MPP Analytics for PostgreSQL cluster connection endpoints inside Yandex Cloud. To access data, you need a separate username and password.
Warning
First, grant network access from Yandex Query to Yandex MPP Analytics for PostgreSQL clusters. To do this, enable Access from Yandex Query in your target database settings.
Query syntax
Greenplum® uses the following SQL syntax:
SELECT * FROM <connection>.<table_name>
Where:
<connection>: Your database connection name.<table_name>: Database table name.
Limits
Working with Greenplum® clusters comes with certain limitations.
The following limitations apply:
-
External sources are only available for data reading through
SELECTrequests. Yandex Query does not currently support requests that modify tables in external sources. - Data reads are performed in a single thread via the master host (coordinator host
) of the Greenplum® cluster. Massively parallel reads are not currently supported. - YQ uses the Yandex Managed Service for YDB type system
. However, the valid value ranges for YDB date and time types, i.e.,Date,Datetime, andTimestamp, are often too narrow to accommodate the values of the corresponding Greenplum® types, i.e.,dateandtimestamp.
Therefore, YQ returns date and time values read from Greenplum® as plain strings (Optional<Utf8>) in ISO-8601 format.
Filter pushdown
Yandex Query can transfer processing of request parts to the data source system. This means that filter expressions are provided through Yandex Query directly to the database for processing; these are typically the query conditions specified in WHERE. This kind of processing is called filter pushdown.
Filter pushdown is possible when using:
| Description | Example |
|---|---|
NULL check |
WHERE column1 IS NULL or WHERE column1 IS NOT NULL |
OR, NOT, or AND logical conditions or parentheses to prioritize calculations |
WHERE column1 IS NULL OR (column2 IS NOT NULL AND column3 > 10). |
Comparison operators (=, ==, !=, <>, >, <, >=, or <=) to other columns or constants |
WHERE column1 > column2 OR column3 <= 10, WHERE column1 + column2 > 10, WHERE column1 = (10 + 10) |
Other filter types do not support source pushdown: the external table rows will be filtered on the federated Yandex Query side, i.e., Yandex Query will perform a full scan of the external table when processing the query.
Supported data types for filter pushdown:
| Yandex Query data type |
|---|
Bool |
Int8 |
Int16 |
Int32 |
Int64 |
Float |
Double |
Supported data types
In Greenplum® databases, the column nullability flag, i.e., whether or not the column can contain NULL values, is not part of the type system. The NOT NULL constraint for each column is implemented via an attnotnull attribute in the pg_attributeNULL values by default, and within the YQ type system they must be mapped to optional
The table below shows the mapping of Greenplum® types to Yandex Query types. Only the listed types are supported.
| Greenplum® data type | Yandex Query data type | Notes |
|---|---|---|
boolean |
Optional<Bool> |
|
smallint |
Optional<Int16> |
|
int2 |
Optional<Int16> |
|
integer |
Optional<Int32> |
|
int |
Optional<Int32> |
|
int4 |
Optional<Int32> |
|
serial |
Optional<Int32> |
|
serial4 |
Optional<Int32> |
|
bigint |
Optional<Int64> |
|
int8 |
Optional<Int64> |
|
bigserial |
Optional<Int64> |
|
serial8 |
Optional<Int64> |
|
real |
Optional<Float> |
|
float4 |
Optional<Float> |
|
double precision |
Optional<Double> |
|
float8 |
Optional<Double> |
|
date |
Optional<Utf8> |
|
timestamp |
Optional<Utf8> |
|
bytea |
Optional<String> |
|
character |
Optional<Utf8> |
Default collation rules apply. The string is padded with spaces to the required length. |
character varying |
Optional<Utf8> |
Default collation rules apply. |
text |
Optional<Utf8> |
Default collation rules apply. |
json |
Optional<Json> |