I would like to parse Datameer job logs to get the duration a job is waited for a Tez session to start.
Datameer acts as a job compiler. It compiles a MapReduce jobs and sends it to a Hadoop cluster for processing. Once a job has been submitted to the cluster, Datameer is no longer in control of it, and only maintains job status telemetry provided by the Application Master. All contents of a
job.log after the message "Submitted application.." and until the cluster job has completed, are provided by cluster services.
The duration in which a job was waiting for cluster resources - is the period between the two messages below:
[qa] INFO [2018-07-11 06:50:00.100] [JobExecutionPlanRunner] (TezClientFacade.java:325) - Wait until Tez session ready (remaining attempts 2) ... [qa] INFO [2018-07-11 06:52:00.100] [JobExecutionPlanRunner] (TezClientFacade.java:327) - Wait until Tez session ready done
These messages will be visible in an individual job execution's
Datameer stores job logs in its HDFS private folder, under the jobhistory directory. The file path pattern is as follows:
/<Datameer private folder>/jobhistory/<configuration ID>/<execution ID>/job.log.
It's possible to parse these files to get the duration of time the cluster spent on resource allocation for Datameer executions.