Logs in Yandex Data Proc
Yandex Data Proc cluster logs are collected and displayed by Yandex Cloud Logging.
All log entries sent by the cluster contain regular filtering parameters:
resource_type
: Always takes thedataproc.cluster
value.resource_id
: Cluster ID.
Yandex Data Proc log entries also contain additional parameters:
hostname
: Host FQDN.log_type
: Type of entries in cluster logs.
Types of log entries Yandex Data Proc
Cluster component logs
Depending on the subcluster role, the following types of entries are used for component logs:
-
All cluster hosts:
cloud-init
: Yandex Data Proc clusters with the image version of 2.0 or higher.salt-minion
: Initialization log of the Yandex Data Proc cluster service.syslog
: System log.telegraf
: Log of sending Yandex Data Proc cluster metrics to Monitoring.
-
Master host:
flume
: Yandex Data Proc clusters with image version below 2.0.hadoop-hdfs-namenode
.hadoop-hdfs-secondarynamenode
.hadoop-mapreduce
.hadoop-yarn-resourcemanager
.hadoop-yarn-timelineserver
.hbase-master
.hbase-rest
.hbase-thrift
.hive-metastore
.hiveserver2
.hive-webhcat-console
: Yandex Data Proc clusters with image version below 2.0.hive-webhcat-console-error
: Yandex Data Proc clusters with image version below 2.0.hive-webhcat
: Yandex Data Proc clusters with image version below 2.0.knox
: Yandex Data Proc clusters with image version below 2.0.livy-out
.livy-request
.oozie
.oozie-audit
.oozie-error
.oozie-instrumentation
.oozie-jetty
.oozie-jpa
.oozie-ops
.postgres
.sqoop
: Yandex Data Proc clusters with image version below 2.0.supervisor
: Yandex Data Proc clusters with image version below 2.0.yandex-dataproc-agent
.zeppelin
.zookeeper
.
-
Data storage subcluster hosts:
hadoop-hdfs-datanode
.hadoop-yarn-nodemanager
.
-
Data storage subcluster hosts contain
hadoop-yarn-nodemanager
logs.
Job logs
The following types of entries are added to job logs:
-
Entries of YARN container logs.
The entry type is
containers
.The entries also have tags:
-
yarn_log_type
: Name of the log file YARN saves as a container log.Examples:
stdout
stderr
launch_container.sh
prelaunch.out
directory.info
-
container_id
: ID of the YARN container, e.g.,container_1638976919626_0002_01_000001
. -
application_id
: ID of the YARN application, e.g.,application_1638976919626_0002
.
-
-
Log entries of the launching process output. They are saved if the job has been started via the Yandex Data Proc API rather than on cluster hosts.
For the entry type, specify
job_output
.The entries contain the
job_id
tag with the job ID created via the Yandex Data Proc API. If the job started but has not been completed at the validation stage, the entries include theapplication_id
tag.