Working with Yandex MPP Analytics for PostgreSQL databases
This section provides the basic information about working with Yandex MPP Analytics for PostgreSQL.
To work with a Yandex MPP Analytics for PostgreSQL database, follow these steps:
- Create a connection containing your database connection credentials.
- Run a query to the database.
Example of a query for reading data from Yandex MPP Analytics for PostgreSQL:
SELECT * FROM greenplum_mdb_connection.my_table
Where:
greenplum_mdb_connection: Name of the DB connection you created.my_table: Name of the table in the database.
Setting up a connection
To create a connection to Yandex MPP Analytics for PostgreSQL:
-
In the management console
, select the folder where you want to create a connection. -
In the list of services, select Yandex Query.
-
In the left-hand panel, go to the Connections tab.
-
Click
Create new. -
Specify the connection parameters:
-
Under General parameters:
- Name: Name of the connection to Yandex MPP Analytics for PostgreSQL.
- Type:
Managed Service for Greenplum.
-
Under Connection type parameters:
-
Cluster: Select an existing Yandex MPP Analytics for PostgreSQL cluster or create a new one.
-
Service account: Select an existing Yandex MPP Analytics for PostgreSQL service account, or create a new one with the
managed-greenplum.viewerrole, and use it to connect toYandex MPP Analytics for PostgreSQLclusters.To use a service account, the
iam.serviceAccounts.userrole is required. -
Database: Select the database you will use when working with the Greenplum® cluster.
-
Schema: Specify the namespace
to use when working with the Greenplum® database. -
Login: Username to use when connecting to Greenplum® databases.
-
Password: User password to use when connecting to Greenplum® databases.
-
-
-
Click Create.
You need a service account to detect Yandex MPP Analytics for PostgreSQL cluster connection points inside Yandex Cloud. You need a separate login/password pair to access data.
Warning
First, allow network access from Yandex Query to Yandex MPP Analytics for PostgreSQL clusters. To do this, enable Access from Yandex Query in the settings of the database to which you are connecting.
Query syntax
Here is the SQL query format used to access Greenplum®:
SELECT * FROM <connection>.<table_name>
Where:
<connection>: Name of the DB connection you created.<table_name>: Name of the table in the database.
Limitations
Some limitations apply when working with Greenplum® clusters.
Limitations:
-
External sources are only available for data reading through
SELECTrequests. Yandex Query does not currently support requests that modify tables in external sources. - Data is read in a single stream via the master host, i.e., the coordinator host
of a Greenplum® cluster. Currently, massive parallel reading is not supported. - YQ uses the Yandex Managed Service for YDB type system
. However, the ranges of acceptable values for types used in YDB for date and time operations (Date,Datetime, andTimestamp) often turn out to be insufficiently wide to cover the values of the relevant Greenplum® types (dateandtimestamp).
Therefore, YQ returns date and time values read from Greenplum® as plain strings (Optional<Utf8>type) in ISO-8601 format.
Filter pushdown
Yandex Query can transfer processing of request parts to the data source system. This means that filter expressions are provided through Yandex Query directly to the database for processing; these are typically the query conditions specified in WHERE. This kind of processing is called filter pushdown.
Filter pushdown is possible when using:
| Description | Example |
|---|---|
NULL check |
WHERE column1 IS NULL or WHERE column1 IS NOT NULL |
OR, NOT, or AND logical conditions or parentheses to prioritize calculations |
WHERE column1 IS NULL OR (column2 IS NOT NULL AND column3 > 10). |
Comparison operators (=, ==, !=, <>, >, <, >=, or <=) to other columns or constants |
WHERE column1 > column2 OR column3 <= 10, WHERE column1 + column2 > 10, WHERE column1 = (10 + 10) |
Other filter types do not support source pushdown: the external table rows will be filtered on the federated Yandex Query side, i.e., Yandex Query will perform a full scan of the external table when processing the query.
Supported data types for filter pushdown:
| Data type Yandex Query |
|---|
Bool |
Int8 |
Int16 |
Int32 |
Int64 |
Float |
Double |
Supported data types
In a Greenplum® DB, the optionality of column values (whether or not the column can contain NULL values) does not depend on the type system. The NOT NULL constraint for each column is implemented by means of the attnotnull attribute in the pg_attributeNULL values and the YQ type system will represent them as optional
The table below shows how Greenplum® and Yandex Query types map. All other data types except those listed are not supported.
| Data type Greenplum® | Data type Yandex Query | Notes |
|---|---|---|
boolean |
Optional<Bool> |
|
smallint |
Optional<Int16> |
|
int2 |
Optional<Int16> |
|
integer |
Optional<Int32> |
|
int |
Optional<Int32> |
|
int4 |
Optional<Int32> |
|
serial |
Optional<Int32> |
|
serial4 |
Optional<Int32> |
|
bigint |
Optional<Int64> |
|
int8 |
Optional<Int64> |
|
bigserial |
Optional<Int64> |
|
serial8 |
Optional<Int64> |
|
real |
Optional<Float> |
|
float4 |
Optional<Float> |
|
double precision |
Optional<Double> |
|
float8 |
Optional<Double> |
|
date |
Optional<Utf8> |
|
timestamp |
Optional<Utf8> |
|
bytea |
Optional<String> |
|
character |
Optional<Utf8> |
Default sorting rules apply; the string is padded with spaces to the required length. |
character varying |
Optional<Utf8> |
Default sorting rules apply. |
text |
Optional<Utf8> |
Default sorting rules apply. |
json |
Optional<Json> |