Yandex Monitoring metric reference
This section describes Managed Service for Apache Kafka® metrics delivered to Monitoring.
The name label contains the metric name.
Labels shared by all Managed Service for Apache Kafka® metrics:
| Label | Value |
|---|---|
| service | Service ID: managed-kafka |
| resource_type | Resource type: cluster |
| resource_id | Cluster ID |
| host | Host FQDN |
| node | Broker type: leader, follower, or replica |
| subcluster_name | Subcluster type: zookeeper_subcluster or kafka_subcluster |
CPU metrics
CPU core workload.
| Name Type, units |
Description |
|---|---|
cpu.fractionDGAUGE, % |
Guaranteed vCPU performance |
cpu.guaranteeDGAUGE, count |
Guaranteed number of cores |
cpu.limitDGAUGE, count |
Maximum number of cores in use |
cpu.guestDGAUGE, % |
CPU core usage, guest usage type |
cpu.idleDGAUGE, % |
CPU core usage, idle usage type |
cpu.iowaitDGAUGE, % |
CPU core usage, iowait usage type |
cpu.irqDGAUGE, % |
CPU core usage, irq usage type |
cpu.niceDGAUGE, % |
CPU core usage, nice usage type |
cpu.softirqDGAUGE, % |
CPU core usage, softirq usage type |
cpu.stealDGAUGE, % |
CPU core usage, steal usage type |
cpu.systemDGAUGE, % |
CPU core usage, system usage type |
cpu.userDGAUGE, % |
CPU core usage, user usage type |
load.avg_15minDGAUGE, % |
Average load over 15 minutes |
load.avg_1minDGAUGE, % |
Average load over one minute |
load.avg_5minDGAUGE, % |
Average load over five minutes |
Disk metrics
| Name Type, units |
Description |
|---|---|
disk.free_bytesDGAUGE, bytes |
Free space |
disk.free_inodesDGAUGE, count |
Free inodes |
disk.total_bytesDGAUGE, bytes |
Available space |
disk.total_inodesDGAUGE, count |
Available inodes |
disk.used_bytesDGAUGE, bytes |
Used space |
disk.used_inodesDGAUGE, count |
Used inodes |
Disk operation metrics
| Name Type, units |
Description |
|---|---|
io.avg_read_timeDGAUGE, milliseconds |
Average disk read time |
io.avg_write_timeDGAUGE, milliseconds |
Average disk write time |
io.disk*.avg_read_timeDGAUGE, milliseconds |
Average read time for a given disk |
io.disk*.avg_write_timeDGAUGE, milliseconds |
Average write time for a given disk |
io.disk*.read_bytesDGAUGE, bytes per second |
Read speed for a given disk |
io.disk*.read_countDGAUGE, operations per second |
Number of reads per second for a given disk |
io.disk*.read_merged_countDGAUGE, operations per second |
Number of merged read operations per second for a given disk |
io.disk*.utilizationDGAUGE, % |
Utilization of a given disk; disabled for network drives. |
io.disk*.write_bytesDGAUGE, bytes per second |
Write speed for a given disk |
io.disk*.write_countDGAUGE, operations per second |
Number of writes per second for a given disk |
io.disk*.write_merged_countDGAUGE, operations per second |
Number of merged write operations per second for a given disk |
io.read_bytesDGAUGE, bytes per second |
Disk read rate |
io.read_countDGAUGE, operations per second |
Number of read operations per second |
io.read_merged_countDGAUGE, operations per second |
Number of merged read operations per second |
io.utilizationDGAUGE, % |
Disk utilization disabled for network drives |
io.write_bytesDGAUGE, bytes per second |
Disk write speed |
io.write_countDGAUGE, operations per second |
Number of writes per second |
io.write_merged_countDGAUGE, operations per second |
Number of merged write operations per second |
RAM metrics
| Name Type, units |
Description |
|---|---|
mem.guarantee_bytesDGAUGE, bytes |
Guaranteed memory allocation |
mem.limit_bytesDGAUGE, bytes |
Memory limit |
mem.active_bytesDGAUGE, bytes |
Active resident memory (frequently accessed and released when absolutely necessary) |
mem.available_bytesDGAUGE, bytes |
RAM usage, available usage type |
mem.buffers_bytesDGAUGE, bytes |
RAM usage, buffers usage type |
mem.cached_bytesDGAUGE, bytes |
RAM usage, cached usage type |
mem.free_bytesDGAUGE, bytes |
Amount of free RAM available, excluding mem.buffers_bytes and mem.cached_bytes |
mem.shared_bytesDGAUGE, bytes |
RAM usage, shared usage type |
mem.total_bytesDGAUGE, bytes |
RAM usage, total usage type |
mem.used_bytesDGAUGE, bytes |
Amount of RAM currently used by running processes |
Network metrics
| Name Type, units |
Description |
|---|---|
net.bytes_recvDGAUGE, bytes per second |
Network data receive rate |
net.bytes_sentDGAUGE, bytes per second |
Network data transmit rate |
net.dropinDGAUGE, count |
Dropped receive packets |
net.dropoutDGAUGE, count |
Dropped transmit packets |
net.errinDGAUGE, count |
Receive error count |
net.erroutDGAUGE, count |
Transmit error count |
net.packets_recvDGAUGE, packets per second |
Network packet receive rate |
net.packets_sentDGAUGE, packets per second |
Network packet transmit rate |
Service metrics
|
Name |
Description |
|
|
Leader broker switch rate per unit of time. In a normal state, it is |
|
|
Number of active controllers. |
|
|
Number of topics |
|
|
Number of offline partitions. |
|
|
Imbalance count in the preferred replica. In a normal state, it is |
|
|
Message lag: Difference between the consumer offset and the partition's latest offset. |
|
|
Partition's current consumer group offset. |
|
|
Partition's first offset. |
|
|
Partition's last offset. |
|
|
Disk’s partition size. |
|
|
Number of hosts in the cluster |
|
|
Broker health indicator. The metric calculation algorithm depends on whether there are any highly available topics (further referred to as HA topics) and which state their partition leaders have:
For more information about the It can be either |
|
|
Number of enqueued requests |
|
|
Number of errors. |
|
|
Time it takes the leader broker to process a request. |
|
|
Message format conversion time. |
|
|
Follower broker wait time. |
|
|
Request queue wait time. |
|
|
Number of requests. |
|
|
Response queue wait time. |
|
|
Response send time. |
|
|
Total request execution time. |
|
|
Average network processor idle percentage. Its value ranges from |
|
|
Incoming data size |
|
|
Outgoing data size |
|
|
Number of requests received with errors |
|
|
Number of requests processed with errors |
|
|
Number of written messages |
|
|
Replicated data size |
|
|
Average request handler idle percentage. Its value ranges from |
|
|
Broker state:
|
|
|
Maximum lag of message replication between the follower and leader brokers. |
|
|
Number of partitions led by the broker |
|
|
Number of partitions with no leader broker. These partitions do not support message writes or reads. |
|
|
Number of partitions per broker |
|
|
Number of partitions with the leader being reassigned |
|
|
Number of partitions with in-sync replica (ISR) count below the set minimum |
|
|
Number of partitions with ISR count below the replication factor |
|
|
Request latency in ZooKeeper. |
|
|
Number of active shards |
|
|
Partition max offset |
|
|
Partition min offset |
Note
This section lists only basic Managed Service for Apache Kafka® metrics delivered to Monitoring. For information on all Managed Service for Apache Kafka® metrics, see the official documentation
Other metrics
|
Name |
Description |
|
|
Host read access indicator. The metric calculation algorithm depends on whether there are any highly available topics (further referred to as HA topics) and which state their partition leaders have:
It can be either |
|
|
Host write access indicator. The metric calculation algorithm depends on whether there are any highly available topics (further referred to as HA topics) and which state their partition leaders have:
For more information about the Additionally, the storage is checked for available space. It should be more than 5%. If there is not enough space, the host is unavailable for writes. It can be either |