Yandex Managed Service for Apache Kafka® metrics
Written by
Updated at March 28, 2024
This section describes the Managed Service for Apache Kafka® metrics delivered to Monitoring.
The name of the metric is written in the name
label.
Common labels for all Managed Service for Apache Kafka® metrics:
Label | Value |
---|---|
service | Service ID: managed-kafka |
resource_type | Resource type: cluster |
resource_id | Cluster ID |
host | Host FQDN |
node | Broker type: leader , follower , and replica |
subcluster_name | Subcluster type: zookeeper_subcluster , kafka_subcluster |
CPU metrics
Processor core workload.
Name Type, units |
Description |
---|---|
cpu.fraction DGAUGE , % |
Guaranteed vCPU share |
cpu.guarantee DGAUGE , number |
Guaranteed number of cores |
cpu.limit DGAUGE , number |
Limit on CPU cores in use |
cpu.guest DGAUGE , % |
CPU core usage, guest usage type |
cpu.idle DGAUGE , % |
CPU core usage, idle usage type |
cpu.iowait DGAUGE , % |
CPU core usage, iowait usage type |
cpu.irq DGAUGE , % |
CPU core usage, irq usage type |
cpu.nice DGAUGE , % |
CPU core usage, nice usage type |
cpu.softirq DGAUGE , % |
CPU core usage, softirq usage type |
cpu.steal DGAUGE , % |
CPU core usage, steal usage type |
cpu.system DGAUGE , % |
CPU core usage, system usage type |
cpu.user DGAUGE , % |
CPU core usage, user usage type |
load.avg_15min DGAUGE , % |
Average load over 15 minutes |
load.avg_1min DGAUGE , % |
Average load over 1 minute |
load.avg_5min DGAUGE , % |
Average load over 5 minutes |
Disk metrics
Name Type, unit |
Description |
---|---|
disk.free_bytes DGAUGE , bytes |
Free space |
disk.free_inodes DGAUGE , number |
Number of free inodes |
disk.total_bytes DGAUGE , bytes |
Available space |
disk.total_inodes DGAUGE , number |
Available inodes |
disk.used_bytes DGAUGE , bytes |
Used space |
disk.used_inodes DGAUGE , number |
Used inodes |
Disk operation metrics
Name Type, units |
Description |
---|---|
io.avg_read_time DGAUGE , ms |
Average disk read time |
io.avg_write_time DGAUGE , ms |
Average disk write time |
io.disk*.avg_read_time DGAUGE , ms |
Average read time for a specific disk |
io.disk*.avg_write_time DGAUGE , ms |
Average write time for a specific disk |
io.disk*.read_bytes DGAUGE , bytes per second |
Read speed for a specific disk |
io.disk*.read_count DGAUGE , operations per second |
Read operations per second for a specific disk |
io.disk*.read_merged_count DGAUGE , operations per second |
Merged read operations per second for a specific disk |
io.disk*.utilization DGAUGE , % |
Utilization of a specific disk; disabled for network drives |
io.disk*.write_bytes DGAUGE , bytes per second |
Write speed for a specific disk |
io.disk*.write_count DGAUGE , operations per second |
Number of write operations per second for a specific disk |
io.disk*.write_merged_count DGAUGE , operations per second |
Number of merged write operations per second for a specific disk |
io.read_bytes DGAUGE , bytes/s |
Disk read speed |
io.read_count DGAUGE , operations per second |
Number of read operations per second |
io.read_merged_count DGAUGE , operations per second |
Number of merged read operations per second |
io.utilization DGAUGE , % |
Disk utilization |
io.write_bytes DGAUGE , bytes/s |
Disk write speed |
io.write_count DGAUGE , operations per second |
Number of write operations per second |
io.write_merged_count DGAUGE , operations per second |
Number of merged write operations per second |
RAM metrics
Name Type, units |
Description |
---|---|
mem.guarantee_bytes DGAUGE , bytes |
Guaranteed memory |
mem.limit_bytes DGAUGE , bytes |
Memory limit |
mem.active_bytes DGAUGE , bytes |
Amount of RAM used most often and only freed up when absolutely necessary |
mem.available_bytes DGAUGE , bytes |
RAM usage, available usage type |
mem.buffers_bytes DGAUGE , bytes |
RAM usage, buffers usage type |
mem.cached_bytes DGAUGE , bytes |
RAM usage, cached usage type |
mem.free_bytes DGAUGE , bytes |
Amount of free RAM available, excluding mem.buffers_bytes and mem.cached_bytes |
mem.shared_bytes DGAUGE , bytes |
RAM usage, shared usage type |
mem.total_bytes DGAUGE , bytes |
RAM usage, total usage type |
mem.used_bytes DGAUGE , bytes |
Amount of RAM currently used by the running processes |
Network metrics
Name Type, units |
Description |
---|---|
net.bytes_recv DGAUGE , bytes/s |
Rate of receiving data over the network |
net.bytes_sent DGAUGE , bytes/s |
Rate of sending data over the network |
net.dropin DGAUGE , number |
Packets dropped upon receipt |
net.dropout DGAUGE , number |
Packets dropped when being sent |
net.errin DGAUGE , number |
Number of errors upon receipt |
net.errout DGAUGE , number |
Number of errors at sending |
net.packets_recv DGAUGE , packets per second |
Rate of receiving packets over the network |
net.packets_sent DGAUGE , packets per second |
Rate of sending packets over the network |
Service metrics
Name Type, units |
Description |
---|---|
kafka_controller_ControllerStats_LeaderElectionRateAndTimeMs DGAUGE , ms |
Leader broker change per unit of time, normally shows 0. The value may increase under maintenance, which does not indicate a problem. Additional labels: quantile |
kafka_controller_KafkaController_ActiveControllerCount DGAUGE , number |
Number of active controllers |
kafka_controller_KafkaController_GlobalTopicCount DGAUGE , number |
Number of topics |
kafka_controller_KafkaController_OfflinePartitionsCount DGAUGE , number |
Number of offline partitions |
kafka_controller_KafkaController_PreferredReplicaImbalanceCount DGAUGE , number |
Imbalance indicator in the required allocation of replicas, normally shows 0 . |
kafka_group_topic_partition_lag DGAUGE , number |
Message lag: Difference between the offset and total number of messages in the partition |
kafka_group_topic_partition_offset DGAUGE , number |
Partition offset |
kafka_host_count DGAUGE , number |
Number of hosts in the cluster |
kafka_is_alive DGAUGE , 0/1 |
Broker health indicator.1 if a broker is alive, 0 if it is not. |
kafka_network_RequestChannel_RequestQueueSize DGAUGE , number |
Requests enqueued |
kafka_network_RequestMetrics_Errors DGAUGE , number |
Number of errors. Additional labels: request |
kafka_network_RequestMetrics_LocalTimeMs DGAUGE , ms |
Time it takes the leader broker to process a request. Additional labels: request , quantile |
kafka_network_RequestMetrics_MessageConversionsTimeMs DGAUGE , ms |
Message format conversion time. Additional labels: request , quantile |
kafka_network_RequestMetrics_RemoteTimeMs DGAUGE , ms |
Follower broker waiting time. Additional labels: request , quantile |
kafka_network_RequestMetrics_RequestQueueTimeMs DGAUGE , ms |
Waiting time in the request queue. Additional labels: request , quantile |
kafka_network_RequestMetrics_Requests DGAUGE , number |
Number of requests. Additional labels: request |
kafka_network_RequestMetrics_ResponseQueueTimeMs DGAUGE , ms |
Waiting time in the response queue. Additional labels: request , quantile |
kafka_network_RequestMetrics_ResponseSendTimeMs DGAUGE , ms |
Response send time. Additional labels: request , quantile |
kafka_network_RequestMetrics_TotalTimeMs DGAUGE , ms |
Total request execution time. Additional labels: request , quantile |
kafka_network_SocketServer_NetworkProcessorAvgIdlePercent DGAUGE , % |
Network processor average idle value. Ranges from 0 (all resources are utilized) to 1 (all resources are free). |
kafka_server_BrokerTopicMetrics_BytesIn DGAUGE , bytes |
Input data size |
kafka_server_BrokerTopicMetrics_BytesOut DGAUGE , bytes |
Output data size |
kafka_server_BrokerTopicMetrics_FailedFetchRequests DGAUGE , number |
Number of failed incoming requests |
kafka_server_BrokerTopicMetrics_FailedProduceRequests DGAUGE , number |
Number of requests that failed to be processed |
kafka_server_BrokerTopicMetrics_MessagesIn DGAUGE , number |
Number of written messages |
kafka_server_BrokerTopicMetrics_ReplicationBytesIn DGAUGE , bytes |
Replicated data size |
kafka_server_KafkaRequestHandlerPool_RequestHandlerAvgIdlePercent_count DGAUGE , % |
Request handler average idle value. Ranges from 0 (all resources are utilized) to 1 (all resources are free). |
kafka_server_KafkaServer_BrokerState DGAUGE |
Broker state: 0: Not Running 1: Starting 2: Recovering from Unclean Shutdown 3: Running as Broker 4: Running as Controller 5: Pending Controlled ShutdownStates 6: Broker Shutting Down |
kafka_server_ReplicaFetcherManager_MaxLag DGAUGE , number |
Maximum lag of message replication between the follower and leader brokers. Additional labels: clientId |
kafka_server_ReplicaManager_LeaderCount DGAUGE , number |
Number of partitions with a leader broker |
kafka_server_ReplicaManager_OfflineReplicaCount DGAUGE , number |
Number of partitions without a leader. These partitions do not support message writes or reads. |
kafka_server_ReplicaManager_PartitionCount DGAUGE , number |
Number of partitions for a broker |
kafka_server_ReplicaManager_ReassigningPartitions DGAUGE , number |
Number of partitions with the leader being reassigned |
kafka_server_ReplicaManager_UnderMinIsrPartitionCount DGAUGE , number |
Number of partitions with a number of in-sync replicas below the minimum allowed value specified in the settings |
kafka_server_ReplicaManager_UnderReplicatedPartitions DGAUGE , number |
Number of partitions whose replication factor is greater than the number of their in-sync replicas (ISRs) |
kafka_server_ZooKeeperClientMetrics_ZooKeeperRequestLatencyMs DGAUGE , ms |
Request latency in ZooKeeper. Additional labels: quantile |
kafka_shard_count DGAUGE , number |
Number of active shards |