Problem
A Datameer job fails and in the job log, the following stacktrace is displayed:
ERROR [2015-01-01 00:00:00.000] [ConcurrentJobExecutor-4] (ClusterSession.java:186) - Failed to run cluster job 'Workbook job (12345): MyWorkbook with MyJob#Joined(Disconnected record stream)' [1 hrs, 18 mins, 24 sec] java.lang.RuntimeException: Job job_1447373200318_0080 failed! Failure info: Task failed task_1447373200318_0080_r_000071 Job failed as tasks failed. failedMaps:0 failedReduces:1 at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49) at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:31) at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:228) at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:128) at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:181) at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:48) at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:135) at datameer.dap.common.security.DatameerSecurityService$1.call(DatameerSecurityService.java:129) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Job job_1447373200318_0080 failed! Failure info: Task failed task_1447373200318_0080_r_000071 Job failed as tasks failed. failedMaps:0 failedReduces:1 at datameer.dap.common.job.mr.DefaultMrJobClient.waitUntilJobCompletion(DefaultMrJobClient.java:234) at datameer.dap.common.job.mr.DefaultMrJobClient.runJobImpl(DefaultMrJobClient.java:91) at datameer.dap.common.job.mr.MrJobClient.runJob(MrJobClient.java:34) at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:216) ... 9 more Caused by: java.lang.RuntimeException: Task: AttemptID:attempt_1447373200318_0080_r_000071_3 Timed out after 600 secs
Cause
The timeout occurs when a task isn't updating on the cluster side within the specified time frame. This problem might occur due to priorities of other tasks on that node at that time. Ultimately, the task was terminated by Hadoop because it exceeded the timeout value (in milliseconds).
mapreduce.task.timeout
Solution
To be more flexible, increase the timeout parameter by setting 6 million milliseconds
mapreduce.task.timeout=6000000
for this job and re-running it. A Datameer administrator can implement this recommendation.
If that doesn't resolve the issue, contact Datameer Support for further assistance.
Further Information
This issue is described in the Apache Hadoop documentation of mapred-default.xml.
"The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. A value of 0 disables the timeout."
Comments
0 comments
Please sign in to leave a comment.