FAQ about Yandex MPP Analytics for PostgreSQL
General questions
Connection
Backups
-
When are backups performed? Is a database cluster available during backup?
-
Can I run Yandex MPP Analytics for PostgreSQL cluster backups manually?
-
Can I select other resources when restoring a cluster from a backup?
Updating a cluster
Yandex MPP Analytics for PostgreSQL clusters and hosts
-
Why is the cluster slow even though the computing resources are not used fully?
-
Why do I get an error about minimum memory for Greenplum® processes?
Working with external tables
Monitoring
General questions
What is Yandex MPP Analytics for PostgreSQL?
Yandex MPP Analytics for PostgreSQL is a solution that helps you create, operate, and scale Greenplum® databases in the cloud.
With Yandex MPP Analytics for PostgreSQL, you can:
- Create a database with the performance parameters tailored to your needs.
- Scale computing power and dedicated storage capacity for your databases as needed.
- Get database logs.
Yandex MPP Analytics for PostgreSQL takes over time-consuming Greenplum® infrastructure administration tasks:
- Monitors resource usage.
- Automatically creates DB backups.
- Provides fault tolerance through automatic failover to backup replicas.
- Keeps database software updated.
You work with a Yandex MPP Analytics for PostgreSQL database cluster as if it were a regular database in your local infrastructure This allows you to manage internal database settings to meet your app requirements.
What is Yandex MPP Analytics for PostgreSQL's share of database management and maintenance work?
When you create clusters, Yandex MPP Analytics for PostgreSQL allocates resources, installs the DBMS, and creates databases.
For all created and running databases, Yandex MPP Analytics for PostgreSQL automatically creates backups and applies fixes and updates.
Yandex MPP Analytics for PostgreSQL also provides data replication between database hosts (both inside and between availability zones) and automatically switches the load over to a backup replica in the event of a failure.
Be mindful of what is what is controlled by the service, and what by the Yandex Cloud customer. Understanding these control zones will help you use your cloud resources effectively and avoid potential database-related problems. For more information, see Zones of control between managed database (MDB) service users and Yandex Cloud.
Not sure whether to use Yandex MPP Analytics for PostgreSQL or VMs running databases?
Yandex Cloud offers two ways to work with databases:
- Yandex MPP Analytics for PostgreSQL: Enables you to operate template databases without needing to manage their administration.
- Yandex Compute Cloud VM: Enables you to create and configure your own databases. This approach allows you to use any database management systems, access databases via SSH, and so on.
How do I get started with Yandex MPP Analytics for PostgreSQL?
Yandex MPP Analytics for PostgreSQL is available to all registered Yandex Cloud users.
To create a database cluster in Yandex MPP Analytics for PostgreSQL, you need to define its settings:
- Host class (performance characteristics, such as CPUs, RAM, etc.).
- Storage size (fully reserved when creating the cluster).
- Network your cluster will be connected to.
- Number of hosts for a cluster and the cluster availability zone.
For a detailed guide, see Creating a cluster.
What happens when a new DBMS version is released?
The database software is updated when new minor versions are released. Owners of the affected DB clusters are notified of an expected maintenance period and DB availability in advance.
What happens when a DBMS version becomes deprecated?
One month after the database version becomes deprecated, Yandex MPP Analytics for PostgreSQL automatically sends email notifications to the owners of DB clusters created with this version.
New hosts can no longer be created using deprecated DBMS versions. Database clusters are automatically upgraded to the next supported version seven days after notification for minor versions and one month after notification for major versions. Deprecated major versions are going to be upgraded even if you have disabled automatic updates.
Does the service meet the requirements of the Russian Federation Federal Law 152-FZ on personal data?
Yes, it does. You can read the full security audit conclusion here
Can I get logs of my operations in Yandex Cloud?
Yes, you can request information about operations with your resources from Yandex Cloud logs. Do it by contacting support
Connection
Can I connect to the DB via SSH and get superuser permissions?
No, you cannot connect via SSH, nor can you get superuser permissions. This is done for the sake of security and user cluster fault tolerance because direct changes inside hosts can render them completely inoperable. However, you can connect to the DB as an admin user with the mdb_admin role. The privileges it has matches those of the superuser. For more information, see The mdb_admin role instead of a superuser.
How can I access a running DB host?
You can connect to Yandex MPP Analytics for PostgreSQL databases using standard DBMS methods.
Learn more about connecting to clusters.
What do I do if I get the revocation check error when using PowerShell to obtain an SSL certificate?
Here is the full text of the error:
curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012)
The revocation function was unable to check revocation for the certificate
This means, when connecting to the website, the service failed to check whether or not the website’s certificate is on the list of revoked certificates.
To fix this error:
-
Make sure the corporate network settings do not block the check.
-
Run the command with the
--ssl-no-revokeparameter.mkdir $HOME\.postgresql; curl.exe --ssl-no-revoke -o $HOME\.postgresql\root.crt https://storage.yandexcloud.net/cloud-certs/CA.pem
How do I set up user authentication?
You can set up user authentication in Yandex MPP Analytics for PostgreSQL using rules.
For more information, see User authentication.
Backups
When are backups performed? Is a DB cluster available during backup?
The backup window is an interval during which a full daily backup of the DB cluster is performed. You can configure a backup window when creating or editing a cluster.
Clusters remain fully accessible during backups.
Is DB host backup enabled by default?
Yes, backup is enabled by default. For Greenplum®, a full backup is performed every day, saving all DB cluster transaction logs. The first and every second automatic backups are full backups of all databases. Other backups are incremental and store only the data that has changed since the previous backup to save space.
Automatically created backups of an existing cluster are kept for seven days, whereas those created manually are stored indefinitely. Once the cluster is deleted, all its backups are kept for seven days.
Can I run Yandex MPP Analytics for PostgreSQL cluster backups manually?
Yes, Yandex MPP Analytics for PostgreSQL supports manually running a cluster backup.
Can I select other resources when restoring a cluster from a backup?
Yes, with the following restrictions:
- The total number of segments must be the same as in the source cluster.
- The disk size per segment in the new cluster must be at least as large as in the source cluster.
Example
The source cluster has four segment hosts, each containing four segments. The total number of segments is 16. When restoring the cluster, you can choose two segment hosts with eight segments per host, so that the total number of segments remains 16.
To ensure that the disk size per segment does not decrease, the disk size in each segment host must at least double.
Updating a cluster
How can I change the computing resources and storage size for a database cluster?
You can change computing resources and storage size in the management console. All you need to do is choose a different host class for the required cluster.
The cluster characteristics change within 30 minutes. During this period, other maintenance activities may also be enabled for the cluster, such as installing updates.
Yandex MPP Analytics for PostgreSQL clusters and hosts
What is a database host and database cluster?
A database host is an isolated database environment in the cloud infrastructure with dedicated computing resources and reserved data storage.
A database cluster is one or more database hosts between which you can configure replication.
How many database hosts can there be in one cluster?
A Yandex MPP Analytics for PostgreSQL cluster includes a minimum of 4 hosts:
- 2 master hosts.
- 2 segment hosts.
You can increase the number of segment hosts up to 32.
For more information, see Quotas and limits.
How many clusters can you create in a single cloud?
For more information on MDB technical and organizational limitations, see Quotas and limits.
How are DB clusters maintained?
In Yandex MPP Analytics for PostgreSQL, maintenance implies:
- Automatic installation of DBMS updates and fixes for your database hosts.
- Changes to the host class and storage size.
- Other Yandex MPP Analytics for PostgreSQL maintenance activities.
For more information, see Maintenance.
How do you calculate usage cost for a database host?
In Yandex MPP Analytics for PostgreSQL, the usage cost is calculated based on the following parameters:
- Selected host class.
- Size of the storage reserved for the database host.
- Size of the database cluster backups. Backup space in the amount of the reserved storage is free of charge. Backup storage that exceeds this size is charged at special rates.
- Number of hours of database host operation. Partial hours are rounded to an integer value. You can find the cost per hour for each host class in the Pricing policy section.
Why is the cluster slow even though the computing resources are not used fully?
Your storage may have insufficient maximum IOPS and bandwidth to process the current number of requests. In this case, throttling occurs, which degrades the entire cluster performance.
The maximum IOPS and bandwidth values increase by a fixed value when the storage size increases by a certain step. The step and increment values depend on the disk type:
| Disk type | Step, GB | Max IOPS increase (read/write) | Max bandwidth increase (read/write), MB/s |
|---|---|---|---|
network-hdd |
256 | 300/300 | 30/30 |
network-ssd |
32 | 1,000/1,000 | 15/15 |
network-ssd-nonreplicated, network-ssd-io-m3 |
93 | 28,000/5,600 | 110/82 |
To increase the maximum IOPS and bandwidth values and make throttling less likely, increase the storage size when you update your cluster.
If you are using the network-hdd storage type, consider switching to network-ssd or network-ssd-nonreplicated by restoring the cluster from a backup.
Why do I get the minimum memory error for Greenplum® processes?
When creating, modifying, or restoring a cluster, you may get this error:
Per process memory must be more then '20971520' bytes on segment host, got '<calculated_memory_size>'
This error occurs if the memory size for each Greenplum® process is less than 20 MB and the number of connections equals the max_connections value. Minimum memory per cluster process is calculated using the following formula:
<host_segment_RAM> ÷ (<max_connections> x <number_of_segments_per_host>)
To fix the error, do one of the following:
- Reduce the
max_connectionsvalue. - Increase memory size by changing the segment host class.
Working with external tables
How are user credentials transmitted when working with external tables?
When working with external tables using the PXF protocol, user credentials are provided as plain text. Therefore, such credentials are only available to the administrator user with the mdb_admin role. Other users have no access to the credentials for security reasons.
Monitoring
What metrics and processes can be tracked using monitoring?
For all DBMS types, you can track:
- CPU, memory, network, or disk usage, in absolute terms.
- Memory, network, or disk usage as a percentage of the set limits for the corresponding cluster host class.
- Amount of data in the DB cluster and the remaining free space in the data storage.
For DB hosts, you can track metrics specific to the corresponding type of DBMS. For example, for Greenplum®, you can track:
- Average query execution time.
- Number of requests per second.
- Number of errors in logs.
You can monitor with a minimum resolution of 5 seconds.
For more information about monitoring, see Monitoring cluster and host state.
What is the retention period for logs?
Cluster logs are stored for 30 days.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of Broadcom Inc. in the United States and/or other countries.