Goal
I would like to parse Datameer job logs to get the duration a job is waited for a Tez session to start.
Learn
Datameer acts as a job compiler. It compiles a MapReduce jobs and sends it to a Hadoop cluster for processing. Once a job has been submitted to the cluster, Datameer is no longer in control of it, and only maintains job status telemetry provided by the Application Master. All contents of a job.log
after the message "Submitted application.." and until the cluster job has completed, are provided by cluster services.
The duration in which a job was waiting for cluster resources - is the period between the two messages below:
[qa] INFO [2018-07-11 06:50:00.100] [JobExecutionPlanRunner] (TezClientFacade.java:325) - Wait until Tez session ready (remaining attempts 2) ...
[qa] INFO [2018-07-11 06:52:00.100] [JobExecutionPlanRunner] (TezClientFacade.java:327) - Wait until Tez session ready done
These messages will be visible in an individual job execution's job.log
.
Datameer stores job logs in its HDFS private folder, under the jobhistory directory. The file path pattern is as follows:
/<Datameer private folder>/jobhistory/<configuration ID>/<execution ID>/job.log
.
It's possible to parse these files to get the duration of time the cluster spent on resource allocation for Datameer executions.
Comments
0 comments
Please sign in to leave a comment.