Logs in Yandex Data Processing
Yandex Data Processing cluster logs are collected and displayed by Yandex Cloud Logging.
All log entries sent by the cluster contain regular filtering parameters:
resource_type
: Always takes thedataproc.cluster
value.resource_id
: Cluster ID.
Yandex Data Processing log entries also contain additional parameters:
hostname
: Host FQDNlog_type
: Type of entries in cluster logs.
Types of log entries Yandex Data Processing
Cluster component logs
Depending on the subcluster role, the following types of entries are used for component logs:
-
All cluster hosts:
cloud-init
: Yandex Data Processing clusters with the image version of 2.0 or higher.salt-minion
: Initialization log of the Yandex Data Processing cluster service.syslog
: System log.telegraf
: Log of outgoing Yandex Data Processing cluster metrics sent to Monitoring.
-
Master host:
flume
: Yandex Data Processing clusters with image version below 2.0.hadoop-hdfs-namenode
.hadoop-hdfs-secondarynamenode
.hadoop-mapreduce
.hadoop-yarn-resourcemanager
.hadoop-yarn-timelineserver
.hbase-master
.hbase-rest
.hbase-thrift
.hive-metastore
.hiveserver2
.hive-webhcat-console
: Yandex Data Processing clusters with image version below 2.0.hive-webhcat-console-error
: Yandex Data Processing clusters with image version below 2.0.hive-webhcat
: Yandex Data Processing clusters with image version below 2.0.knox
: Yandex Data Processing clusters with image version below 2.0.livy-out
.livy-request
.oozie
.oozie-audit
.oozie-error
.oozie-instrumentation
.oozie-jetty
.oozie-jpa
.oozie-ops
.postgres
.sqoop
: Yandex Data Processing clusters with image version below 2.0.supervisor
: Yandex Data Processing clusters with image version below 2.0.yandex-dataproc-agent
.zeppelin
.zookeeper
.
-
Data storage subcluster hosts:
hadoop-hdfs-datanode
.hadoop-yarn-nodemanager
.
-
Data storage subcluster hosts contain
hadoop-yarn-nodemanager
logs.
Job logs
The following types of entries are added to job logs:
-
Entries of YARN container logs.
The entry type is
containers
.The entries also have tags:
-
yarn_log_type
: Name of the log file YARN saves as a container log.Examples:
stdout
stderr
launch_container.sh
prelaunch.out
directory.info
-
container_id
: ID of the YARN container, e.g.,container_1638976919626_0002_01_000001
. -
application_id
: ID of the YARN application, e.g.,application_1638976919626_0002
.
-
-
Log entries of the launching process output. They are saved if the job has been started via the Yandex Data Processing API rather than on cluster hosts.
For the entry type, specify
job_output
.The entries contain the
job_id
tag with the job ID created via the Yandex Data Processing API. If the job started but has not been completed at the validation stage, the entries include theapplication_id
tag.