Monitoring transfer status
Transfer status details are available in the management console:
- Detailed diagnostic information is presented as charts. You can view them in the Monitoring tab of the transfer management page or in Yandex Monitoring.
You can configure alerts in Yandex Monitoring to receive notifications about transfer failures. In Yandex Monitoring, there are two alert thresholds: Warning
and Alarm
. If the specified threshold is exceeded, you will receive alerts via the configured notification channels.
You can also use the Yandex Cloud mobile app to monitor transfer statuses and get their logs.
Monitoring transfer status
- Go to the folder page
and select Yandex Data Transfer. - In the left-hand panel, select
Transfers. - Click the name of the transfer you need and open the
Monitoring tab. - To get started with Yandex Monitoring metrics, dashboards, or alerts, click Open in Monitoring in the top panel.
The following charts open on the page:
Number of source events
publisher.data.changeitems
Number of source events generated for a transfer (apart from the data to transfer, these events may include housekeeping operations).
Number of target events
sinker.pusher.data.changeitems
Number of events written to the target (apart from the data to transfer, these events may include housekeeping operations).
Maximum data transfer delay
sinker.pusher.time.row_max_lag_sec
Maximum data lag (in seconds).
Reads
publisher.data.bytes
The amount of data read from the source (in bytes).
Data transfer delay
sinker.pusher.time.row_lag_sec
Time difference between when the records appear in the target and when they appear in the source (in seconds). The histogram is divided into bins
. Let us assume, the histogram is showing two bins
for 45 and 60 at a given point in time, with each containing a value equal to 50%. This means that half the records being transferred at the time had a delay of between 30 and 45 seconds, and the other half of between 45 and 60 seconds.
Source buffer size
publisher.consumer.log_usage_bytes
The size, in bytes, of the buffer or write ahead log (when supported) in the source.
Rows written to target, by table
sinker.table.rows
50 tables with the maximum number of rows written to the target.
Target response time
sinker.pusher.time.batch_push_distribution_sec
Full time it takes to write a batch to the target, including data preprocessing (in seconds).
Rows awaiting transfer, by table
task.snapshot.remainder.table
The number of rows awaiting transfer.
Operation status
task.status
Type of the operation in progress: 1
, meaning the task is active.
Alert settings in Yandex Monitoring
- In the management console
, select the folder with the transfer you want to set up alerts for. - In the list of services, select
Monitoring. - Under Service dashboards, select Data Transfer.
- In the chart you need, click
and select Create alert. - If the chart shows multiple metrics, select a data query to generate a metric and click Continue. For more information about the query language, see the Yandex Monitoring documentation.
- Set the
Alarm
andWarning
threshold values to trigger the alert. - Click Create alert.
Recommended alerts
Number of source events
Alert triggering means that the source base generated no replicated Data Transfer events (individual data elements) during the evaluation window.
Possible causes include:
- The source base is not available over the network for Data Transfer, e.g., due to revoked accesses or a source base failure.
- The source base has no data to replicate.
Alert parameters:
-
Metrics:
<cloud_name> > <folder_name>
service = data-transfer
name = publisher.data.changeitems
derivative()
(in the Transformation section) -
Alert settings:
- Condition:
Less than or equals
. - Alarm:
0
. - Warning:
-
.
You can additionally set the
Warning
triggering condition for the situations when the number of replicated operations is below the expected value.Additional settings:
- Aggregation function:
Maximum
. - Evaluation window:
5 minutes
. If the source database changes less frequently than once every five minutes, increase the evaluation window to the maximum allowable interval between two DML operations with data in the source.
- Condition:
Number of target events
When an alert is triggered, it means that the target database recorded no replicated Data Transfer events during the evaluation window.
Possible causes include:
- The source or target base is not available over the network for Data Transfer, e.g., due to revoked accesses or a source/target base failure.
- The source base has no data to replicate.
- The data from the source database cannot be replicated to the target one, e.g., due to the target data type limitations in the target database.
Alert parameters:
-
Metrics:
<cloud_name> > <folder_name>
service = data-transfer
name = sinker.pusher.data.changeitems
derivative()
(in the Transformation section) -
Alert settings:
- Condition:
Less than or equals
. - Alarm:
0
. - Warning:
-
.
You can additionally set the
Warning
triggering condition for the situations when the number of replicated operations is below the expected value.Additional settings:
- Aggregation function:
Maximum
. - Evaluation window:
5 minutes
. If the source database changes less frequently than once every five minutes, increase the evaluation window to the maximum allowable interval between two DML operations with data in the source.
- Condition:
Maximum data transfer delay
Alert triggering means that the time difference between execution of the operation with rows in the source and the target has exceeded the specified threshold during the evaluation window.
Possible causes include:
- The target database is not available over the network for Data Transfer, e.g., due to revoked accesses or a target database failure.
- Not enough resources for replication. For example, the load on the source database exceeds the capacity of the VM instance the Data Transfer replication is running on.
- The data from the source database cannot be replicated to the target one, e.g., due to the target data type limitations in the target database.
Alert parameters:
-
Metrics:
<cloud_name> > <folder_name>
service = data-transfer
name = sinker.pusher.time.row_max_lag_sec
-
Alert settings:
- Condition:
Greater than or equals
. - Alarm:
15
. If the target database is slow, or large blocks of data are being replicated at a time, set the maximum possible value. - Warning:
-
.
Additional settings:
- Aggregation function:
Minimum
. - Evaluation window:
1 minute
.
- Condition:
Read
Alert triggering means that no bytes of data were read from the source during the evaluation window.
Possible causes include:
- The source base is not available over the network for Data Transfer, e.g., due to revoked accesses or a source base failure.
- The source base has no data to replicate.
Alert parameters:
-
Metrics:
<cloud_name> > <folder_name>
service = data-transfer
name = publisher.data.bytes
derivative()
(in the Transformation section) -
Alert settings:
- Condition:
Equals to
. - Alarm:
0
. - Warning:
-
.
Additional settings:
- Aggregation function:
Maximum
. - Evaluation window:
15 minutes
. If the source database changes less frequently than once every 15 minutes, increase the evaluation window to the maximum allowable interval between two DML operations with data in the source.
- Condition:
Specifics of working with alerts
- To determine the causes of the transfer failure, check all available alerts. Information about which alerts worked and which did not will enable you to determine the cause more accurately. For example, if the Number of source events alert has fired, and the Number of target events alert has not, in all probability the problem is not on the source.