Yandex Monitoring metric reference
Written by
Updated at April 28, 2025
This section describes Managed Service for Apache Kafka® metrics delivered to Monitoring.
The name
label contains the metric name.
Labels shared by all Managed Service for Apache Kafka® metrics:
Label | Value |
---|---|
service | Service ID: managed-kafka |
resource_type | Resource type: cluster |
resource_id | Cluster ID |
host | Host FQDN |
node | Broker type: leader , follower , or replica |
subcluster_name | Subcluster type: zookeeper_subcluster or kafka_subcluster |
CPU metrics
These metrics show processor core workload.
Name Type, units |
Description |
---|---|
cpu.fraction DGAUGE , % |
Guaranteed vCPU performance |
cpu.guarantee DGAUGE , count |
Guaranteed number of cores |
cpu.limit DGAUGE , count |
Maximum number of cores in use |
cpu.guest DGAUGE , % |
CPU core usage, guest usage type |
cpu.idle DGAUGE , % |
CPU core usage, idle usage type |
cpu.iowait DGAUGE , % |
CPU core usage, iowait usage type |
cpu.irq DGAUGE , % |
CPU core usage, irq usage type |
cpu.nice DGAUGE , % |
CPU core usage, nice usage type |
cpu.softirq DGAUGE , % |
CPU core usage, softirq usage type |
cpu.steal DGAUGE , % |
CPU core usage, steal usage type |
cpu.system DGAUGE , % |
CPU core usage, system usage type |
cpu.user DGAUGE , % |
CPU core usage, user usage type |
load.avg_15min DGAUGE , % |
Average load over 15 minutes |
load.avg_1min DGAUGE , % |
Average load over one minute |
load.avg_5min DGAUGE , % |
Average load over five minutes |
Disk metrics
Name Type, units |
Description |
---|---|
disk.free_bytes DGAUGE , bytes |
Free space |
disk.free_inodes DGAUGE , count |
Free inodes |
disk.total_bytes DGAUGE , bytes |
Available space |
disk.total_inodes DGAUGE , count |
Available inodes |
disk.used_bytes DGAUGE , bytes |
Used space |
disk.used_inodes DGAUGE , count |
Used inodes |
Disk I/O metrics
Name Type, units |
Description |
---|---|
io.avg_read_time DGAUGE , milliseconds |
Average disk read time |
io.avg_write_time DGAUGE , milliseconds |
Average disk write time |
io.disk*.avg_read_time DGAUGE , milliseconds |
Average read time for a given disk |
io.disk*.avg_write_time DGAUGE , milliseconds |
Average write time for a given disk |
io.disk*.read_bytes DGAUGE , bytes per second |
Read speed for a given disk |
io.disk*.read_count DGAUGE , operations per second |
Number of reads per second for a given disk |
io.disk*.read_merged_count DGAUGE , operations per second |
Number of merged read operations per second for a given disk |
io.disk*.utilization DGAUGE , % |
Utilization of a given disk; disabled for network drives. |
io.disk*.write_bytes DGAUGE , bytes per second |
Write speed for a given disk |
io.disk*.write_count DGAUGE , operations per second |
Number of writes per second for a given disk |
io.disk*.write_merged_count DGAUGE , operations per second |
Number of merged write operations per second for a given disk |
io.read_bytes DGAUGE , bytes per second |
Disk read rate |
io.read_count DGAUGE , operations per second |
Number of read operations per second |
io.read_merged_count DGAUGE , operations per second |
Number of merged read operations per second |
io.utilization DGAUGE , % |
Disk utilization |
io.write_bytes DGAUGE , bytes per second |
Disk write speed |
io.write_count DGAUGE , operations per second |
Number of writes per second |
io.write_merged_count DGAUGE , operations per second |
Number of merged write operations per second |
RAM metrics
Name Type, units |
Description |
---|---|
mem.guarantee_bytes DGAUGE , bytes |
Guaranteed memory allocation |
mem.limit_bytes DGAUGE , bytes |
Memory limit |
mem.active_bytes DGAUGE , bytes |
Active resident memory (frequently accessed and released when absolutely necessary) |
mem.available_bytes DGAUGE , bytes |
RAM usage, available usage type |
mem.buffers_bytes DGAUGE , bytes |
RAM usage, buffers usage type |
mem.cached_bytes DGAUGE , bytes |
RAM usage, cached usage type |
mem.free_bytes DGAUGE , bytes |
Amount of free RAM available, excluding mem.buffers_bytes and mem.cached_bytes |
mem.shared_bytes DGAUGE , bytes |
RAM usage, shared usage type |
mem.total_bytes DGAUGE , bytes |
RAM usage, total usage type |
mem.used_bytes DGAUGE , bytes |
Amount of RAM currently used by running processes |
Network metrics
Name Type, units |
Description |
---|---|
net.bytes_recv DGAUGE , bytes per second |
Network data receive rate |
net.bytes_sent DGAUGE , bytes per second |
Network data transmit rate |
net.dropin DGAUGE , count |
Dropped receive packets |
net.dropout DGAUGE , count |
Dropped transmit packets |
net.errin DGAUGE , count |
Receive error count |
net.errout DGAUGE , count |
Transmit error count |
net.packets_recv DGAUGE , packets per second |
Network packet receive rate |
net.packets_sent DGAUGE , packets per second |
Network packet transmit rate |
Service metrics
Name Type, units |
Description |
---|---|
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs DGAUGE , milliseconds |
Leader broker switch rate per unit of time. In a normal state, it is 0 . Its value may increase during maintenance, which does not indicate a problem.Additional labels: quantile . |
kafka_controller_KafkaController_ActiveControllerCount DGAUGE , count |
Number of active controllers |
kafka_controller_KafkaController_GlobalTopicCount DGAUGE , count |
Number of topics |
kafka_controller_KafkaController_OfflinePartitionsCount DGAUGE , count |
Number of offline partitions |
kafka_controller_KafkaController_PreferredReplicaImbalanceCount DGAUGE , count |
Imbalance count in the preferred replica. In a normal state, it is 0 . |
kafka_group_topic_partition_lag DGAUGE , count |
Message lag: Difference between the consumer offset and the partition's latest offset. |
kafka_group_topic_partition_offset DGAUGE , count |
Partition offset |
kafka_host_count DGAUGE , count |
Number of hosts in the cluster |
kafka_is_alive DGAUGE , 0/1 |
Broker health indicator. It can be either 1 if a broker is alive or 0 if it is not. |
kafka_network_RequestChannel_RequestQueueSize DGAUGE , count |
Number of enqueued requests |
kafka_network_RequestMetrics_Errors DGAUGE , count |
Number of errors. Additional labels: request . |
kafka_network_RequestMetrics_LocalTimeMs DGAUGE , milliseconds |
Time it takes the leader broker to process a request. Additional labels: request and quantile . |
kafka_network_RequestMetrics_MessageConversionsTimeMs DGAUGE , milliseconds |
Message format conversion time. Additional labels: request and quantile . |
kafka_network_RequestMetrics_RemoteTimeMs DGAUGE , milliseconds |
Follower broker wait time. Additional labels: request and quantile . |
kafka_network_RequestMetrics_RequestQueueTimeMs DGAUGE , milliseconds |
Request queue wait time. Additional labels: request and quantile . |
kafka_network_RequestMetrics_Requests DGAUGE , count |
Number of requests. Additional labels: request . |
kafka_network_RequestMetrics_ResponseQueueTimeMs DGAUGE , milliseconds |
Response queue wait time. Additional labels: request and quantile . |
kafka_network_RequestMetrics_ResponseSendTimeMs DGAUGE , milliseconds |
Response send time. Additional labels: request and quantile . |
kafka_network_RequestMetrics_TotalTimeMs DGAUGE , milliseconds |
Total request execution time. Additional labels: request and quantile . |
kafka_network_SocketServer_NetworkProcessorAvgIdlePercent DGAUGE , % |
Average network processor idle percentage. Its value ranges from 0 (fully utilized) to 1 (completely idle). |
kafka_server_BrokerTopicMetrics_BytesIn DGAUGE , bytes |
Incoming data size |
kafka_server_BrokerTopicMetrics_BytesOut DGAUGE , bytes |
Outgoing data size |
kafka_server_BrokerTopicMetrics_FailedFetchRequests DGAUGE , count |
Number of requests received with errors |
kafka_server_BrokerTopicMetrics_FailedProduceRequests DGAUGE , count |
Number of requests processed with errors |
kafka_server_BrokerTopicMetrics_MessagesIn DGAUGE , count |
Number of written messages |
kafka_server_BrokerTopicMetrics_ReplicationBytesIn DGAUGE , bytes |
Replicated data size |
kafka_server_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_count DGAUGE , % |
Average request handler idle percentage. Its value ranges from 0 (fully utilized) to 1 (completely idle). |
kafka_server_KafkaServer_BrokerState DGAUGE |
Broker state: 0: Not Running 1: Starting 2: Recovering from Unclean Shutdown 3: Running as Broker 4: Running as Controller 5: Pending Controlled ShutdownStates 6: Broker Shutting Down |
kafka_server_ReplicaFetcherManager_MaxLag DGAUGE , count |
Maximum lag of message replication between the follower and leader brokers. Additional labels: clientId . |
kafka_server_ReplicaManager_LeaderCount DGAUGE , count |
Number of partitions led by the broker |
kafka_server_ReplicaManager_OfflineReplicaCount DGAUGE , count |
Number of partitions with no leader broker. These partitions do not support message writes or reads. |
kafka_server_ReplicaManager_PartitionCount DGAUGE , count |
Number of partitions per broker |
kafka_server_ReplicaManager_ReassigningPartitions DGAUGE , count |
Number of partitions with the leader being reassigned |
kafka_server_ReplicaManager_UnderMinIsrPartitionCount DGAUGE , count |
Number of partitions with in-sync replica (ISR) count below the set minimum |
kafka_server_ReplicaManager_UnderReplicatedPartitions DGAUGE , count |
Number of partitions with ISR count below the replication factor |
kafka_server_ZooKeeperClientMetrics_ZooKeeperRequestLatencyMs DGAUGE , milliseconds |
Request latency in ZooKeeper. Additional labels: quantile . |
kafka_shard_count DGAUGE , count |
Number of active shards |