Transferring data from a MySQL source endpoint
- Scenarios for transferring data from MySQL
- Preparing the source database
- Configuring the MySQL source endpoint
- Configuring the data target
- Operations with the database during transfer
- Troubleshooting data transfer issues
- A single transaction's log size exceeds 4 GB
- No new tables are added
- Error when transferring from AWS RDS for MySQL
- Error when transfering tables without primary keys
- Error when accessing a binary log
- Error when dropping a table under the Drop cleanup policy
- Time shift in DATETIME data type when transferring to ClickHouse®
Yandex Data Transfer enables you to migrate data from a MySQL database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Prepare the MySQL database for the transfer.
- Set up an endpoint source in Yandex Data Transfer.
- Set up one of the supported data targets.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data from MySQL
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
-
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.
- Capturing changes from MySQL and delivering to YDS.
- Capturing changes from MySQL and delivering to Apache Kafka®.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
-
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Preparing the source database
-
Enable full binary logging on the source by setting the Binlog row image parameter
toFULL
orNOBLOB
. -
(Optional) Set a limit on the size of data chunks to be sent using the Max allowed packet parameter.
-
Create a user for connecting to the source.
-
Grant the user the
ALL_PRIVILEGES
privilege for the source database. -
Grant the user the
REPLICATION CLIENT
andREPLICATION SLAVE
administrative privileges.
-
-
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred. To make sure the transfer is running properly when migrating a database with such tables:
-
Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
-
Create
PRIMARY KEYS
in those tables to be migrated where there are no keys.-
To get a list of tables without a primary key, run the query:
SELECT tab.table_schema AS database_name, tab.table_name AS table_name, tab.table_rows AS table_rows FROM information_schema.tables tab LEFT JOIN information_schema.table_constraints tco ON (tab.table_schema = tco.table_schema AND tab.table_name = tco.table_name AND tco.constraint_type = 'PRIMARY KEY') WHERE tab.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND tco.constraint_type IS NULL AND tab.table_type = 'BASE TABLE';
-
Study the structure of tables without a primary key that need to be transferred to the target:
SHOW CREATE TABLE <database_name>.<table_name>;
-
Add a simple or complex primary key to the tables that need to be transferred to the target:
ALTER TABLE <table_name> ADD PRIMARY KEY (<column_or_column_group>);
-
If the table being transferred to the target has no column or group of columns suitable for the role of the primary key, create a new column:
ALTER TABLE <table_name> ADD id BIGINT PRIMARY KEY AUTO_INCREMENT;
-
Note
If, when creating a primary key, you get an error saying
Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again
, increase theInnodb log file size
parameter value in the DBMS settings. -
-
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the MySQL source.
-
Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer
. -
Make sure the source uses the MyISAM or InnoDB low-level storage subsystem. If you use other subsystems, the transfer may fail.
-
Enable full binary logging
on the source by setting thebinlog_row_image
parameter toFULL
orNOBLOB
. -
Specify row format for the binary log
on the source by setting thebinlog_format
parameter toROW
. -
If the replication source is a cluster that is behind the load balancer, enable GTID mode for it (
GTID-MODE = ON
).If it is not possible to enable GTID mode for any reason, make sure the binary log name template contains the host name.
In both cases, this will allow replication to continue even after changing the master host.
-
(Optional) Set a limit
on the size of data chunks to be sent using themax_allowed_packet
parameter. -
Create a user to connect to the source and grant them the required privileges:
CREATE USER '<username>'@'%' IDENTIFIED BY '<password>'; GRANT ALL PRIVILEGES ON <database_name>.* TO '<username>'@'%'; GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO '<username>'@'%';
-
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred. To make sure the transfer is running properly when migrating a database with such tables:
-
Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
-
Create
PRIMARY KEYS
in those tables to be migrated where there are no keys.-
To get a list of tables without a primary key, run the query:
SELECT tab.table_schema AS database_name, tab.table_name AS table_name, tab.table_rows AS table_rows FROM information_schema.tables tab LEFT JOIN information_schema.table_constraints tco ON (tab.table_schema = tco.table_schema AND tab.table_name = tco.table_name AND tco.constraint_type = 'PRIMARY KEY') WHERE tab.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND tco.constraint_type IS NULL AND tab.table_type = 'BASE TABLE';
-
Study the structure of tables without a primary key that need to be transferred to the target:
SHOW CREATE TABLE <database_name>.<table_name>;
-
Add a simple or complex primary key to the tables that need to be transferred to the target:
ALTER TABLE <table_name> ADD PRIMARY KEY (<column_or_column_group>);
-
If the table being transferred to the target has no column or group of columns suitable for the role of the primary key, create a new column:
ALTER TABLE <table_name> ADD id BIGINT PRIMARY KEY AUTO_INCREMENT;
-
Note
If, when creating a primary key, you get an error saying
Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again
, increase theinno_db_log_file_size
parameter value in the DBMS settings. -
-
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the MySQL source.
Configuring the MySQL source endpoint
When creating or updating an endpoint, you can define:
- Yandex Managed Service for MySQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for MySQL cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-mysql.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for MySQL cluster: Specify ID of the cluster to connect to.
-
Database: Specify the name of the database in the selected cluster. Leave the field empty if you want to transfer tables from multiple databases at the same time. In this case, specify the database for creating service tables in the Database for auxiliary tables field.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
mysql-source
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
mysql_source
.
-
connection.mdb_cluster_id
: ID of cluster to connect to. -
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
mysql_source {
security_groups = ["<list_of_security_group_IDs>"]
connection {
mdb_cluster_id = "<cluster_ID>"
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
-
mdbClusterId
: ID of the cluster you need to connect to. -
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
For OnPremise, all fields are filled in manually.
-
Host: Enter the IP address or FQDN of the master host you want to connect to.
-
Port: Set the number of the port that Data Transfer will use for the connection.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements. -
Subnet ID: Select or create a subnet in the desired availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Database: Specify the name of the database in the selected cluster. Leave the field empty if you want to transfer tables from multiple databases at the same time. In this case, specify the database for creating service tables in the Database for auxiliary tables field.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
mysql-source
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS . -
--subnet-id
: ID of the subnet the host resides in. -
--database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
mysql_source
.
-
on_premise.hosts
: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address. -
on_premise.port
: Port number that Data Transfer will use for connections. -
on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
on_premise.subnet_id
: ID of the subnet where the host is located. -
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
mysql_source {
security_groups = ["<list_of_security_group_IDs>"]
connection {
on_premise {
hosts = ["<list_of_hosts>"]
port = <port_for_connection>
}
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. tlsMode
: Parameters for encrypting transmitted data if it is required, for example, to meet PCI DSS requirements.disabled
: Disabled.enabled
: Enabled.caCertificate
: CA certificate.
-
subnetId
: ID of the subnet the host resides in.
-
-
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
-
Table filter:
-
Included tables: Only data from the tables listed here will be transferred. This option is specified using regular expressions.
When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
Excluded tables: Data from the listed tables is not transferred. This option is specified using regular expressions.
Regular expressions for included and excluded tables must meet the ID naming rules in MySQL. For more information, see the MySQL documentation
. Escaping double quotes is not required. -
-
Transfer schema: Allows you to select the DB schema elements that will be transferred when activating or deactivating a transfer.
-
Advanced settings:
-
Time zone for connecting to the database: Specify the IANA Time Zone Database
identifier. By default, the server local time zone is used. -
Database for auxiliary tables: Database for dummy tables (
__tm_keeper
and__tm_gtid_keeper
). By default, this is the source database the data is transferred from.
-
-
--include-table-regex
: List of included tables. Only data from the tables listed here will be transferred. This option is specified using regular expressions.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
--exclude-table-regex
: List of excluded tables. Data from the listed tables will not be transferred. This option is specified using regular expressions. -
--timezone
: DB time zone, specified as an IANA Time Zone Database identifier. Defaults to UTC+0. -
Schema transfer settings:
--transfer-before-data
: When activating a transfer.--transfer-after-data
: When deactivating a transfer.
-
include_table_regex
: List of included tables. Only data from the tables listed here will be transferred. This option is specified using regular expressions.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
exclude_table_regex
: List of excluded tables. Data from the listed tables will not be transferred. This option is specified using regular expressions. -
timezone
: DB time zone, specified as an IANA Time Zone Database identifier. Defaults to UTC+0. -
object_transfer_settings
: Schema transfer settings:view
: Views.routine
: Procedures and functions.trigger
: Triggers.
You can specify one of the following values for each entity:
BEFORE_DATA
: Move when activating the transfer.AFTER_DATA
: Move when deactivating the transfer.NEVER
: Do not move.
-
includeTablesRegex
: List of included tables. Only data from the tables listed here will be transferred. This option is specified using regular expressions.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
excludeTables
: List of excluded tables. Data from the listed tables will not be transferred. This option is specified using regular expressions. -
timezone
: DB time zone, specified as an IANA Time Zone Database identifier. Defaults to UTC+0. -
objectTransferSettings
: Settings for transferring a DB schema when activating and deactivating a transfer (BEFORE_DATA
andAFTER_DATA
values, respectively).
Settings for transferring a DB schema when enabling and disabling a transfer
During a transfer, the database schema is transferred from the source to the target. The transfer is performed in two stages:
-
At the activation stage.
This step is performed before copying or replicating data to create a schema on the target. At this stage, you can enable the migration of views and stored procedures, stored functions, and triggers.
-
At the deactivation stage.
This step is performed at the end of the transfer operation when it is deactivated. If the transfer keeps running in replication mode, the final stage of the transfer will be performed only when replication stops. At this stage, you can enable the migration of views and stored procedures, stored functions, and triggers.
At the final stage, it is assumed that when the transfer is deactivated, there is no writing load on the source. You can ensure this by switching to read-only mode. At this stage, the database schema on the target is brought to a state where it will be consistent with the schema on the source.
Known limitations
If you are setting up a transfer from a MySQL cluster, use the cluster master server. During its operation, the transfer creates service tables in the source database. Therefore, you cannot use a MySQL replica as a source, because it is read-only.
If you are setting up a transfer from a MySQL cluster to a ClickHouse® cluster, consider the way the data of date and time types
- Data of the
TIME
type is transferred as strings with the source and target time zones ignored. - When transferring data of the
TIMESTAMP
type, the time zone set in the MySQL source settings or advanced endpoint settings is used. For more information, see the MySQL documentation . - The source endpoint assigns the UTC+0 time zone to data of the
DATETIME
type.
Configuring the data target
Configure one of the supported data targets:
- PostgreSQL
- MySQL
- ClickHouse®
- Greenplum®
- Yandex Managed Service for YDB
- Yandex Object Storage
- Apache Kafka®
- YDS
- Elasticsearch
- OpenSearch
For a complete list of supported sources and targets in Yandex Data Transfer, see Available Transfers.
After configuring the data source and target, create and start the transfer.
Operations with the database during transfer
-
For transfers in the Copying status, any changes to the data schema (
ALTER
) on the source or target will interrupt the transfer. -
For transfers in the Replicating status, you can change the data schema on the source. All
ALTER
operations recorded in the binary log (binlog) on the source will be automatically applied to the target. This process takes some time, so the transfer may slow down.
Troubleshooting data transfer issues
Known issues when using a MySQL endpoint:
- Single transaction log size exceeds 4 GB.
- New tables are not added.
- Error when transferring from AWS RDS for MySQL.
- Error when transfering tables without primary keys.
- Binary log access error.
- Error when dropping a table under the Drop cleanup policy.
- Time shift in the DATETIME data type when transferring to ClickHouse®.
See a full list of recommendations in the Troubleshooting section.
A single transaction's log size exceeds 4 GB
Error message:
Last binlog file <file_name:file_size> is more than 4GB
If the log size of a single transaction exceeds 4 GB, activation of Replication or Snapshot and increment transfers fails due to MySQL internal limits
Solution: Reactivate the transfer.
No new tables are added
No new tables are added to Snapshot and increment transfers.
Solution:
- Deactivate and reactivate the transfer.
- Create tables in the target database manually.
- Create a separate transfer of the Snapshot type and only add to it the newly created tables. Deactivating the original Snapshot and increment transfer is not required.
Error when transferring from AWS RDS for MySQL
Snapshot and increment and Replication transfers from the Amazon RDS for MySQL
Error example:
Failed to execute LoadSnapshot:
Cannot load table "name": unable to read rows and push by chunks:
unable to push changes: unable to execute per table push:
error: err: sql: transaction has already been committed or rolled back
rollback err: sql: transaction has already been committed or rolled back
This error is caused by the short MySQL binary log file retention time in Amazon RDS.
Solution:
Increase the binary log retention time by using the command below:
call mysql.rds_set_configuration('binlog retention hours', <hours>);
The maximum retention time is 168 hours (7 days). The default is NULL
(binary log files are not retained). For more information, see the Amazon RDS documentation
Error when transfering tables without primary keys
Error message:
Primary key check failed: 14 Tables errors: Table no key columns found
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.
Solution: Prepare the source as described in Preparing for the transfer.
Error when accessing a binary log
The following error may occur during Snapshot and increment transfers:
Warn(replication): failed to run (abstract1 source):
failed to run canal: failed to start binlog sync:
failed to get canal event: ERROR 1236 (HY000): Could not find first log file name in binary log index file
The error occurs when the binary log files required for replication are not available. This is usually due to exceeding the maximum allowed file size after adding new changes to the binary log. In this case, some of the old log data is deleted.
Solution:
Increase the maximum allowed size for the binary log files in the MySQL settings using the Mdb preserve binlog bytes parameter.
The minimum value is 1073741824
(1 GB), while the maximum one is 107374182400
(100 GB). The default value is 1073741824
(1 GB).
Error when dropping a table under the Drop cleanup policy
Error message:
ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)
With the Drop
cleanup policy, a transfer drops tables in multiple iterations:
-
The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
-
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
-
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.
Solution:
-
If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
-
Delete the blocking child tables in the target database manually.
-
Use the
Truncate
cleanup policy. -
Recreate the target database.
Warning
This will lead to losing all data in the database.
Time shift in DATETIME data type when transferring to ClickHouse®
Time is shifted because the source endpoint uses the UTC+0 time zone for DATETIME
data. For more information, see Known limitations.
Solution: Apply the appropriate time zone at the target level manually.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.
ClickHouse® is a registered trademark of ClickHouse, Inc