Transferring data to a MySQL target endpoint
- Scenarios for transferring data to MySQL
- Configuring the data source
- Preparing the target database
- Configuring the MySQL target endpoint
- Operations with the database during transfer
- Troubleshooting data transfer issues
- A single transaction's log size exceeds 4 GB
- No new tables are added
- Error when transferring from AWS RDS for MySQL
- Error when transfering tables without primary keys
- Error when accessing a binary log
- Error when dropping a table under the Drop cleanup policy
- Time shift in DATETIME data type when transferring to ClickHouse®
Yandex Data Transfer enables you to migrate data to a MySQL database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Configure one of the supported data sources.
- Prepare the MySQL database for the transfer.
- Configure the target endpoint in Yandex Data Transfer.
- Create a transfer and start it.
- Perform required operations with the database and control the transfer.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data to MySQL
-
Migration: Moving data from one repository to another; it often suggests transferring a database to the cloud, from outdated local databases to managed cloud ones.
-
Data delivery is a process of delivering arbitrary data to target storage. It includes data retrieval from a queue and its deserialization with subsequent transformation to target storage format.
For a detailed description of possible Yandex Data Transfer data transfer scenarios, see Tutorials.
Configuring the data source
Configure one of the supported data sources:
Preparing the target database
-
Make sure that the MySQL major version on the target is not lower than that on the source.
-
Set an SQL Mode matching the source.
-
Create a user for connecting to the source.
- Assign the user the
ALL_PRIVILEGES
role for the source database.
- Assign the user the
-
Make sure the settings for the network hosting the cluster allow public connections from IP addresses used by Data Transfer
. -
Make sure that the MySQL major version on the target is not lower than that on the source.
-
Make sure the target uses the MyISAM or InnoDB low-level storage subsystem.
-
Set an SQL Mode
matching the source. -
Create a user to connect to the target and grant them the required privileges:
CREATE USER '<username>'@'%' IDENTIFIED BY '<password>'; GRANT ALL PRIVILEGES ON <database_name>.* TO '<username>'@'%';
Configuring the MySQL target endpoint
When creating or updating an endpoint, you can define:
- Yandex Managed Service for MySQL cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for MySQL cluster
Warning
To create or edit an endpoint of a managed database, you need to have the managed-mysql.viewer
role or the viewer
primitive role assigned for the folder where this managed database cluster resides.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for MySQL cluster: Specify ID of the cluster to connect to.
-
Database: Specify the name of the database in the selected cluster. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
mysql-target
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
— Database name. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
mysql_target
.
-
connection.mdb_cluster_id
: ID of cluster to connect to. -
database
— Database name. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
mysql_target {
security_groups = ["<list_of_security_group_IDs>"]
connection {
mdb_cluster_id = "<cluster_ID>"
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
-
mdbClusterId
: ID of the cluster you need to connect to. -
database
— Database name. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
For OnPremise, all fields are filled in manually.
-
Host: Enter the IP address or FQDN of the master host you want to connect to.
-
Port: Set the number of the port that Data Transfer will use for the connection.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements. -
Subnet ID: Select or create a subnet in the desired availability zone.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Database: Specify the name of the database in the selected cluster. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
This will let you apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
mysql-target
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS . -
--subnet-id
: ID of the subnet the host resides in. -
--database
— Database name. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
mysql_target
.
-
security_groups
: Security groups for network traffic.Security group rules apply to a transfer. They allow opening up network access from the transfer VM to the VM with the database. For more information, see Networking in Yandex Data Transfer.
Security groups must belong to the same network as the
subnet_id
subnet, if the latter is specified.Note
In Terraform, it is not required to specify a network for security groups.
-
on_premise.hosts
: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address. -
on_premise.port
: Port number that Data Transfer will use for connections. -
on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
on_premise.subnet_id
: ID of the subnet where the host is located. -
database
— Database name. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
mysql_target {
security_groups = ["<list_of_security_group_IDs>"]
connection {
on_premise {
hosts = ["<list_of_hosts>"]
port = <port_for_connection>
}
}
database = "<migrated_database_name>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. tlsMode
: Parameters for encrypting transmitted data if it is required, for example, to meet PCI DSS requirements.disabled
: Disabled.enabled
: Enabled.caCertificate
: CA certificate.
-
subnetId
: ID of the subnet the host resides in.
-
-
database
— Database name. Leave the field empty if you want to create tables in databases with the same names as on the source. In this case, explicitly specify a DB schema for service tables under additional settings. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
Warning
You can configure Cleanup policy and Database schema for auxiliary tables only via the management console.
-
Cleanup policy: Select a way to clean up data in the target database before the transfer:
-
Don't cleanup
: Select this option only for replication without data copying. -
Drop
: Completely delete the tables included in the transfer (default).Use this option to always transfer the latest version of the table schema to the target database from the source whenever the transfer is activated.
-
Truncate
: Delete only the data from the tables included in the transfer but keep the schema.Use this option if the schema in the target database differs from the one that would have been transferred from the source during the transfer.
-
-
Advanced settings:
-
Time zone for connecting to the database: Specify the IANA Time Zone Database
identifier. By default, the server local time zone is used. -
SQL modes: Specify settings to override standard MySQL behavior
. -
Disable constraint check: Used to speed up replication: the
FOREIGN_KEY_CHECKS
and theUNIQUE_CHECKS
values are set to0
(no checks are performed).Warning
Disabling constraint checks will speed up replication, but may affect data integrity when using cascading operations.
-
Database schema for auxiliary tables: Specify the name of the schema where service tables needed for the transfer operation will be created.
-
-
sql_mode
: Specify the settings that override the MySQL standard behavior . -
--skip-constraint-checks
: Used to speed up replication: theFOREIGN_KEY_CHECKS
andUNIQUE_CHECKS
settings are set to0
(no checks are performed).Warning
Disabling constraint checks will speed up replication, but may affect data integrity when using cascading operations.
-
--timezone
: Specify the IANA Time Zone Database identifier. Defaults to UTC+0.
-
sql_mode
: Specify settings to override standard MySQL behavior . The following list is used by default:NO_AUTO_VALUE_ON_ZERO,NO_DIR_IN_CREATE,NO_ENGINE_SUBSTITUTION
. -
skip_constraint_checks
: Used to speed up replication: theFOREIGN_KEY_CHECKS
and theUNIQUE_CHECKS
values are set to0
(no checks are performed).Warning
Disabling constraint checks will speed up replication, but may affect data integrity when using cascading operations.
-
timezone
: Specify the IANA Time Zone Database identifier. Defaults to UTC+0.
For more information, see the Terraform provider documentation
-
sqlMode
: Specify settings to override standard MySQL behavior . -
skipConstraintChecks
: Used to speed up replication: theFOREIGN_KEY_CHECKS
andUNIQUE_CHECKS
settings are set to0
(no checks are performed).Warning
Disabling constraint checks will speed up replication, but may affect data integrity when using cascading operations.
-
timezone
: Specify the IANA Time Zone Database identifier. Defaults to UTC+0.
After configuring the data source and target, create and start the transfer.
Operations with the database during transfer
-
For transfers in the Copying status, any changes to the data schema (
ALTER
) on the source or target will interrupt the transfer. -
For transfers in the Replicating status, you can change the data schema on the source. All
ALTER
operations recorded in the binary log (binlog) on the source will be automatically applied to the target. This process takes some time, so the transfer may slow down.
Troubleshooting data transfer issues
Known issues when using a MySQL endpoint:
- Single transaction log size exceeds 4 GB.
- New tables are not added.
- Error when transferring from AWS RDS for MySQL.
- Error when transfering tables without primary keys.
- Binary log access error.
- Error when dropping a table under the Drop cleanup policy.
- Time shift in the DATETIME data type when transferring to ClickHouse®.
See a full list of recommendations in the Troubleshooting section.
A single transaction's log size exceeds 4 GB
Error message:
Last binlog file <file_name:file_size> is more than 4GB
If the log size of a single transaction exceeds 4 GB, activation of Replication or Snapshot and increment transfers fails due to MySQL internal limits
Solution: Reactivate the transfer.
No new tables are added
No new tables are added to Snapshot and increment transfers.
Solution:
- Deactivate and reactivate the transfer.
- Create tables in the target database manually.
- Create a separate transfer of the Snapshot type and only add to it the newly created tables. Deactivating the original Snapshot and increment transfer is not required.
Error when transferring from AWS RDS for MySQL
Snapshot and increment and Replication transfers from the Amazon RDS for MySQL
Error example:
Failed to execute LoadSnapshot:
Cannot load table "name": unable to read rows and push by chunks:
unable to push changes: unable to execute per table push:
error: err: sql: transaction has already been committed or rolled back
rollback err: sql: transaction has already been committed or rolled back
This error is caused by the short MySQL binary log file retention time in Amazon RDS.
Solution:
Increase the binary log retention time by using the command below:
call mysql.rds_set_configuration('binlog retention hours', <hours>);
The maximum retention time is 168 hours (7 days). The default is NULL
(binary log files are not retained). For more information, see the Amazon RDS documentation
Error when transfering tables without primary keys
Error message:
Primary key check failed: 14 Tables errors: Table no key columns found
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.
Solution: Prepare the source as described in Preparing for the transfer.
Error when accessing a binary log
The following error may occur during Snapshot and increment transfers:
Warn(replication): failed to run (abstract1 source):
failed to run canal: failed to start binlog sync:
failed to get canal event: ERROR 1236 (HY000): Could not find first log file name in binary log index file
The error occurs when the binary log files required for replication are not available. This is usually due to exceeding the maximum allowed file size after adding new changes to the binary log. In this case, some of the old log data is deleted.
Solution:
Increase the maximum allowed size for the binary log files in the MySQL settings using the Mdb preserve binlog bytes parameter.
The minimum value is 1073741824
(1 GB), while the maximum one is 107374182400
(100 GB). The default value is 1073741824
(1 GB).
Error when dropping a table under the Drop cleanup policy
Error message:
ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)
With the Drop
cleanup policy, a transfer drops tables in multiple iterations:
-
The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
-
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
-
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.
Solution:
-
If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
-
Delete the blocking child tables in the target database manually.
-
Use the
Truncate
cleanup policy. -
Recreate the target database.
Warning
This will lead to losing all data in the database.
Time shift in DATETIME data type when transferring to ClickHouse®
Time is shifted because the source endpoint uses the UTC+0 time zone for DATETIME
data. For more information, see Known limitations.
Solution: Apply the appropriate time zone at the target level manually.
ClickHouse® is a registered trademark of ClickHouse, Inc