Transferring data from a MySQL® source endpoint
- Scenarios for transferring data from MySQL®
- Preparing the source database
- Configuring the MySQL® source endpoint
- Configuring the data target
- Operations with the database during transfer
- Troubleshooting data transfer issues
- A single transaction's log size exceeds 4 GB
- No new tables are added
- Error when transferring from AWS RDS for MySQL®
- Error when transfering tables without primary keys
- Error when accessing a binary log
- Unable to get a binary log position
- Error when dropping a table under the Drop cleanup policy
- Time shift in DATETIME data type when transferring to ClickHouse®
Yandex Data Transfer enables you to migrate data from a MySQL® database and implement various data transfer, processing, and transformation scenarios. To implement a transfer:
- Explore possible data transfer scenarios.
- Prepare the MySQL® database for the transfer.
- Set up a source endpoint in Yandex Data Transfer.
- Set up one of the supported data targets.
- Create a transfer and start it.
- Perform the required operations with the database and see how the transfer is going.
- In case of any issues, use ready-made solutions to resolve them.
Scenarios for transferring data from MySQL®
-
Migration: Moving data from one storage to another. Migration often means migrating a database from obsolete local databases to managed cloud ones.
-
Data change capture means tracking changes to a database and delivering those changes to consumers. It is used for applications that are sensitive to real-time data changes.
-
Uploading data to data marts is a process of transferring prepared data to storage for subsequent visualization.
-
Uploading data to scalable Object Storage storage allows you to save on data storage and simplifies the exchange with contractors.
For a detailed description of possible Yandex Data Transfer scenarios, see Tutorials.
Preparing the source database
-
Enable full binary logging on the source by setting the Binlog row image parameter
toFULL
orNOBLOB
. -
Optionally, set a limit on the size of data chunks to send using the Max allowed packet parameter.
-
Create a user for connecting to the source.
-
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred. To make sure the transfer is running properly when migrating a database with such tables:
-
Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
-
Create
PRIMARY KEYS
in those tables to be migrated where there are no keys.-
To get a list of tables without a primary key, run the query:
SELECT tab.table_schema AS database_name, tab.table_name AS table_name, tab.table_rows AS table_rows FROM information_schema.tables tab LEFT JOIN information_schema.table_constraints tco ON (tab.table_schema = tco.table_schema AND tab.table_name = tco.table_name AND tco.constraint_type = 'PRIMARY KEY') WHERE tab.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND tco.constraint_type IS NULL AND tab.table_type = 'BASE TABLE';
-
Study the structure of tables without a primary key that need to be transferred to the target:
SHOW CREATE TABLE <database_name>.<table_name>;
-
Add a simple or complex primary key to the tables that need to be transferred to the target:
ALTER TABLE <table_name> ADD PRIMARY KEY (<column_or_column_group>);
-
If the table being transferred to the target has no column or group of columns suitable for the role of the primary key, create a new column:
ALTER TABLE <table_name> ADD id BIGINT PRIMARY KEY AUTO_INCREMENT;
-
Note
If, when creating a primary key, you get an error saying
Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again
, increase theInnodb log file size
parameter value in the DBMS settings. -
-
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the MySQL® source.
-
If not planning to use Cloud Interconnect or VPN
for connections to an external cluster, make such cluster accessible from the Internet from IP addresses used by Data Transfer .For details on linking your network up with external resources, see this concept.
-
Make sure the source uses the MyISAM or InnoDB low-level storage subsystem. If you use other subsystems, the transfer may fail.
-
Enable full binary logging
on the source by setting thebinlog_row_image
parameter toFULL
orNOBLOB
. -
Specify row format for the binary log
on the source by setting thebinlog_format
parameter toROW
. -
For the Replication and Snapshot and increment transfer types:
-
In the
log_bin
parameter, specify the path to the binary log file . -
Enter the binary log information using the SHOW MASTER STATUS
request (for MySQL® 5.7 and 8.0) or the SHOW BINARY LOG STATUS request (for MySQL® 8.4). The request should return a string with the information, not an empty response.
-
-
If the replication source is a cluster that is behind the load balancer, enable GTID mode for it (
GTID-MODE = ON
).If it is not possible to enable GTID mode for any reason, make sure the binary log name template contains the host name.
In both cases, this will allow replication to continue even after changing the master host.
-
Optionally, set a limit
on the size of outbound data chunks using themax_allowed_packet
parameter. -
Create a user to connect to the source and grant them the required privileges:
CREATE USER '<username>'@'%' IDENTIFIED BY '<password>'; GRANT ALL PRIVILEGES ON <database_name>.* TO '<username>'@'%'; GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO '<username>'@'%';
-
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred. To make sure the transfer is running properly when migrating a database with such tables:
-
Do not transfer tables without primary keys. For this purpose, add them to the list of excluded tables in source endpoint settings.
-
Create
PRIMARY KEYS
in those tables to be migrated where there are no keys.-
To get a list of tables without a primary key, run the query:
SELECT tab.table_schema AS database_name, tab.table_name AS table_name, tab.table_rows AS table_rows FROM information_schema.tables tab LEFT JOIN information_schema.table_constraints tco ON (tab.table_schema = tco.table_schema AND tab.table_name = tco.table_name AND tco.constraint_type = 'PRIMARY KEY') WHERE tab.table_schema NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') AND tco.constraint_type IS NULL AND tab.table_type = 'BASE TABLE';
-
Study the structure of tables without a primary key that need to be transferred to the target:
SHOW CREATE TABLE <database_name>.<table_name>;
-
Add a simple or complex primary key to the tables that need to be transferred to the target:
ALTER TABLE <table_name> ADD PRIMARY KEY (<column_or_column_group>);
-
If the table being transferred to the target has no column or group of columns suitable for the role of the primary key, create a new column:
ALTER TABLE <table_name> ADD id BIGINT PRIMARY KEY AUTO_INCREMENT;
-
Note
If, when creating a primary key, you get an error saying
Creating index 'PRIMARY' required more than 'innodb_online_alter_log_max_size' bytes of modification log. Please try again
, increase theinno_db_log_file_size
parameter value in the DBMS settings. -
-
Deactivate trigger transfer at the transfer initiation stage and reactivate it at the completion stage (for the Replication and the Snapshot and increment transfer types). For more information, see the description of additional endpoint settings for the MySQL® source.
Configuring the MySQL® source endpoint
When creating or updating an endpoint, you can define:
- Yandex Managed Service for MySQL® cluster connection or custom installation settings, including those based on Yandex Compute Cloud VMs. These are required parameters.
- Additional parameters.
Managed Service for MySQL® cluster
Warning
To create or edit an endpoint of a managed database, you will need the managed-mysql.viewer
role or the primitive viewer
role for the folder the cluster of this managed database resides in.
Connecting to the database with the cluster ID specified in Yandex Cloud.
-
Managed Service for MySQL cluster: Specify ID of the cluster to connect to.
-
Database: Specify the name of the database in the selected cluster. Leave the field empty if you want to transfer tables from multiple databases at the same time. In this case, specify the database for creating service tables in the Database for auxiliary tables field.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
mysql-source
.
-
--cluster-id
: ID of the cluster you need to connect to. -
--database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
mysql_source
.
-
connection.mdb_cluster_id
: ID of cluster to connect to. -
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
mysql_source {
security_groups = ["<list_of_security_group_IDs>"]
connection {
mdb_cluster_id = "<cluster_ID>"
}
database = "<name_of_database_to_migrate>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
-
mdbClusterId
: ID of the cluster you need to connect to. -
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Custom installation
For OnPremise, all fields are filled in manually.
-
Host: Enter the IP address or FQDN of the host you want to connect to.
-
Port: Set the number of the port that Data Transfer will use for the connection.
-
CA certificate: Upload the certificate file or add its contents as text if transmitted data must be encrypted, for example, to meet PCI DSS
requirements. -
Subnet ID: Select or create a subnet in the required availability zone. The transfer will use this subnet to access the source.
If the value in this field is specified for both endpoints, both subnets must be hosted in the same availability zone.
-
Database: Specify the name of the database in the selected cluster. Leave the field empty if you want to transfer tables from multiple databases at the same time. In this case, specify the database for creating service tables in the Database for auxiliary tables field.
-
User: Specify the username that Data Transfer will use to connect to the database.
-
Password: Enter the user's password to the database.
-
Security groups: Select the cloud network to host the endpoint and security groups for network traffic.
Thus, you will be able to apply the specified security group rules to the VMs and clusters in the selected network without changing the settings of these VMs and clusters. For more information, see Networking in Yandex Data Transfer.
- Endpoint type:
mysql-source
.
-
--host
: IP address or FQDN of the master host you want to connect to. -
--port
: Number of the port that Data Transfer will use for the connection. -
--ca-certificate
— If the transmitted data needs to be encrypted, for example, to meet the requirements of PCI DSS . -
--subnet-id
: ID of the subnet the host is in. The transfer will use that subnet to access the host. -
--database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
--user
: Username that Data Transfer will use to connect to the database. -
To set a user password to access the DB, use one of the following parameters:
-
--raw-password
: Password as text. -
--password-file
: The path to the password file.
-
- Endpoint type:
mysql_source
.
-
on_premise.hosts
: List of IPs or FQDNs of hosts to connect to. Since only single-item lists are supported, specify the master host address. -
on_premise.port
: Port number that Data Transfer will use for connections. -
on_premise.tls_mode.enabled.ca_certificate
: CA certificate if the data to transfer must be encrypted to comply with PCI DSS requirements. -
on_premise.subnet_id
: ID of the subnet the host is in. The transfer will use that subnet to access the host. -
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Password in text form.
Here is an example of the configuration file structure:
resource "yandex_datatransfer_endpoint" "<endpoint_name_in_Terraform>" {
name = "<endpoint_name>"
settings {
mysql_source {
security_groups = ["<list_of_security_group_IDs>"]
connection {
on_premise {
hosts = ["<list_of_hosts>"]
port = <port_for_connection>
}
}
database = "<name_of_database_to_migrate>"
user = "<username_for_connection>"
password {
raw = "<user_password>"
}
<additional_endpoint_settings>
}
}
}
For more information, see the Terraform provider documentation
onPremise
: Database connection parameters:-
hosts
— IP address or FQDN of the master host to connect to. -
port
: The number of the port that Data Transfer will use for the connection. tlsMode
: Parameters for encrypting transmitted data if it is required, for example, to meet PCI DSS requirements.disabled
: Disabled.enabled
: Enabled.caCertificate
: CA certificate.
-
subnetId
: ID of the subnet the host is in. The transfer will use that subnet to access the host.
-
-
database
— Database name. Leave the field empty if you want to transfer tables from multiple databases at the same time. -
user
: Username that Data Transfer will use to connect to the database. -
password.raw
: Database user password (in text form).
Additional settings
-
Table filter:
-
Included tables: Only data from the tables listed here will be transferred. This option is specified using regular expressions.
When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
Excluded tables: Data from the listed tables is not transferred. This option is specified using regular expressions.
Regular expressions for included and excluded tables must meet the ID naming rules in MySQL®. For more information, see the MySQL® documentation
. Escaping double quotes is not required.Warning
If a table you want to include or exclude has the same name as your database, specify that table in
<DB_name>.<table_name>
format for the filter to work properly. -
-
Transfer schema: Allows you to select the DB schema elements that will be transferred when activating or deactivating a transfer.
-
Advanced settings:
-
Time zone for connecting to the database: Specify the IANA Time Zone Database
identifier. By default, the server local time zone is used. -
Database for auxiliary tables: Database for dummy tables (
__tm_keeper
and__tm_gtid_keeper
). By default, this is the source database the data is transferred from.
-
-
--include-table-regex
: List of included tables. Only the data from the tables listed here will be transferred. This option is specified using regular expressions.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
--exclude-table-regex
: List of excluded tables. Data from the listed tables will not be transferred. This option is specified using regular expressions. -
--timezone
: DB time zone specified as an IANA Time Zone Database ID. Defaults to UTC+0. -
Schema transfer settings:
--transfer-before-data
: When activating a transfer.--transfer-after-data
: When deactivating a transfer.
-
include_table_regex
: List of included tables. Only the data from the tables listed here will be transferred. This option is specified using regular expressions.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
exclude_table_regex
: List of excluded tables. Data from the listed tables will not be transferred. This option is specified using regular expressions. -
timezone
: DB time zone specified as an IANA Time Zone Database ID. Defaults to UTC+0. -
object_transfer_settings
: Schema transfer settings:view
: Views.routine
: Procedures and functions.trigger
: Triggers.
You can specify one of the following values for each entity:
BEFORE_DATA
: Transferring when activating the transfer.AFTER_DATA
: Transferring when deactivating the transfer.NEVER
: No tansfer.
-
includeTablesRegex
: List of included tables. Only the data from the tables listed here will be transferred. This option is specified using regular expressions.When you add new tables when editing an endpoint used in Snapshot and increment or Replication transfers with the Replicating status, the data history for these tables will not get uploaded. To add a table with its historical data, use the List of objects for transfer field in the transfer settings.
-
excludeTablesRegex
: List of excluded tables. Data from the listed tables will not be transferred. This option is specified using regular expressions. -
timezone
: DB time zone specified as an IANA Time Zone Database ID. Defaults to UTC+0. -
objectTransferSettings
: Settings for transferring a DB schema when activating and deactivating a transfer (theBEFORE_DATA
andAFTER_DATA
values, respectively).
Settings for transferring a DB schema when enabling and disabling a transfer
During a transfer, the database schema is transferred from the source to the target. The transfer is performed in two stages:
-
At the activation stage.
This step is performed before copying or replicating data to create a schema on the target. At this stage, you can enable the migration of views and stored procedures, stored functions, and triggers.
-
At the deactivation stage.
This step is performed at the end of the transfer operation when it is deactivated. If the transfer keeps running in replication mode, the final stage of the transfer will be performed only when replication stops. At this stage, you can enable the migration of views and stored procedures, stored functions, and triggers.
At the final stage, it is assumed that when the transfer is deactivated, there is no writing load on the source. You can ensure this by switching to read-only mode. At this stage, the database schema on the target is brought to a state where it will be consistent with the schema on the source.
Known limitations
If setting up a transfer from a MySQL® cluster to a ClickHouse® cluster, consider these aspects of transferring date and time data types
TIME
type data is transferred as strings with the source and target time zones ignored.- When transferring
TIMESTAMP
type data, the time zone set in the MySQL® source settings or advanced endpoint settings is used. For more information, see the MySQL® documentation . - The source endpoint assigns the UTC+0 time zone to data of the
DATETIME
type.
Transfers from MySQL® to a database of a different type do not support fields of the DECIMAL
type to prevent loss of data accuracy. There is no such limitation for MySQL®-to-MySQL® transfers.
Configuring the data target
Configure one of the supported data targets:
- PostgreSQL
- MySQL®
- ClickHouse®
- Greenplum®
- Yandex Managed Service for YDB
- Yandex Object Storage
- Apache Kafka®
- YDS
- Elasticsearch
- OpenSearch
For a complete list of supported sources and targets in Yandex Data Transfer, see Available transfers.
After configuring the data source and target, create and start the transfer.
Operations with the database during transfer
-
For transfers in the Copying status, any changes to the data schema (
ALTER
) on the source or target will interrupt the transfer. -
For transfers in the Replicating status, you can change the data schema on the source. All
ALTER
operations recorded in the binary log (binlog) on the source will be automatically applied to the target. This process takes some time, so the transfer may slow down.
Troubleshooting data transfer issues
Known issues when using a MySQL® endpoint:
- Single transaction log size exceeds 4 GB.
- New tables cannot be added.
- Error when transferring from AWS RDS for MySQL®.
- Error when transferring tables without primary keys.
- Binary log access error.
- Unable to get a binary log position
- Error when dropping a table under the Drop cleanup policy.
- Time shift in the DATETIME data type when transferring to ClickHouse®.
For more troubleshooting tips, see Troubleshooting.
A single transaction's log size exceeds 4 GB
Error message:
Last binlog file <file_name:file_size> is more than 4GB
If the log size of a single transaction exceeds 4 GB, activation of Replication or Snapshot and increment transfers fails due to MySQL® internal limits
Solution: Reactivate the transfer.
No new tables are added
No new tables are added to Snapshot and increment transfers.
Solution:
- Deactivate and reactivate the transfer.
- Create tables in the target database manually.
- Create a separate transfer of the Snapshot type and only add to it the newly created tables. Deactivating the original Snapshot and increment transfer is not required.
Error when transferring from AWS RDS for MySQL®
Snapshot and increment and Replication transfers from the Amazon RDS for MySQL®
Error example:
Failed to execute LoadSnapshot:
Cannot load table "name": unable to read rows and push by chunks:
unable to push changes: unable to execute per table push:
error: err: sql: transaction has already been committed or rolled back
rollback err: sql: transaction has already been committed or rolled back
This error is caused by the short MySQL® binary log file retention time in Amazon RDS.
Solution:
Increase the binary log retention time by using the command below:
call mysql.rds_set_configuration('binlog retention hours', <hours>);
The maximum retention time is 168 hours (7 days). The default is NULL
(binary log files are not retained). For more information, see the Amazon RDS documentation
Error when transfering tables without primary keys
Error message:
Primary key check failed: 14 Tables errors: Table no key columns found
For Replication and Snapshot and increment transfers, tables without primary keys are not transferred.
Solution: Prepare the source as described in Preparing for the transfer.
Error when accessing a binary log
The following error may occur during Snapshot and increment transfers:
Warn(replication): failed to run (abstract1 source):
failed to run canal: failed to start binlog sync:
failed to get canal event: ERROR 1236 (HY000): Could not find first log file name in binary log index file
The error occurs when the binary log files required for replication are not available. This is usually due to exceeding the maximum allowed file size after adding new changes to the binary log. In this case, some of the old log data is deleted.
Solution:
Increase the maximum allowed size for the binary log files in the MySQL® settings using the Mdb preserve binlog bytes parameter.
The minimum value is 1073741824
(1 GB), while the maximum one is 107374182400
(100 GB). The default value is 1073741824
(1 GB).
Unable to get a binary log position
Error message:
unable to get binlog position: Storage <source_address> is not master
You may get this error when activating Replication or Snapshot and increment transfers if the data source is a custom MySQL® installation and binary log file position based replication is set up incorrectly.
Solution:
Run the following checks in MySQL®:
-
Make sure the master is used as the replication source.
-
Make sure the log_bin
parameter specifices the correct path to the binary log file location. -
Enter the binary log information using the SHOW MASTER STATUS
request (for MySQL® 5.7 and 8.0) or the SHOW BINARY LOG STATUS request (for MySQL® 8.4). The request should return a string with the information, not an empty response.
Error when dropping a table under the Drop cleanup policy
Error message:
ERROR: cannot drop table <table_name> because other objects depend on it (SQLSTATE 2BP01)
With the Drop
cleanup policy, a transfer drops tables in multiple iterations:
-
The transfer tries to drop all tables one by one. Cascade delete is not used, as it may delete tables that are not included in the transfer. If a table cannot be dropped, for example, due to its links to external keys, an error occurs, but the transfer will continue dropping tables.
-
During the next iteration, the transfer will try to drop the remaining tables. If the blocking child tables were dropped in the previous iteration, the table linked to external keys will also be dropped. In this case, the error is fixed while Data Transfer is running and no other actions are required.
-
If the transfer does not drop any table during another iteration, table dropping stops. In which case:
- The transfer will continue running if all tables were dropped.
- The transfer will be aborted with an error if there are any tables left.
Solution:
-
If child tables are not involved in other transfers and their transfer is not in conflict with the transfer goals, add these tables:
- To the list of included tables in the source endpoint parameters.
- To the list of objects to be transferred in the transfer parameters.
-
Delete the blocking child tables in the target database manually.
-
Use the
Truncate
cleanup policy. -
Recreate the target database.
Warning
This will lead to losing all data in the database.
Time shift in DATETIME data type when transferring to ClickHouse®
Time is shifted because the source endpoint uses the UTC+0 time zone for DATETIME
data. For more information, see Known limitations.
Solution: Apply the appropriate time zone at the target level manually.
Greenplum® and Greenplum Database® are registered trademarks or trademarks of VMware, Inc. in the United States and/or other countries.
ClickHouse® is a registered trademark of ClickHouse, Inc