Problem
1) Workbook fails after adding a new column in join; jobs used to run successfully before this change.
2) Workbook fails after adding a new sheet, an aggregate sheet which summarizes the join results.
3) This is the error message:
ERROR [2014-10-09 09:39:09.952] [ConcurrentJobExecutor-0] (ClusterSession.java:230) - Failed to generate file for 'DisconnectedRecordStream[sheetName=Joined]' [25 mins, 45 sec] java.lang.RuntimeException: Job job_1412268826978_22844 failed! Failure info: Task failed task_1412268826978_22844_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:38) at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:30) at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:195) at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:136) at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:218) at datameer.dap.common.graphv2.ConcurrentClusterSession$1.run(ConcurrentClusterSession.java:53) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Job job_1412268826978_22844 failed! Failure info: Task failed task_1412268826978_22844_m_000000 Job failed as tasks failed. failedMaps:1 failedReduces:0 at datameer.dap.common.job.mr.HadoopMrJobClient.waitUntilJobCompletion(HadoopMrJobClient.java:176) at datameer.dap.common.job.mr.HadoopMrJobClient.runJobImpl(HadoopMrJobClient.java:77) at datameer.dap.common.job.mr.MrJobClient.runJob(MrJobClient.java:32) at datameer.dap.common.graphv2.hadoop.MrJob.runImpl(MrJob.java:190) ... 9 more Caused by: java.lang.RuntimeException: Task: Error: java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkIndex(Buffer.java:512) at java.nio.ByteBufferAsIntBufferL.put(ByteBufferAsIntBufferL.java:113) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1153) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at datameer.dap.common.graphv2.hadoop.MrJobKeyValueMapper.run(MrJobKeyValueMapper.java:80) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) |
Cause/Solution
1) One solution is to set the following parameters either at the workbook or Datameer level:
mapreduce.task.io.sort.mb=512 io.sort.mb=512 mapreduce.task.io.sort.factor=20 io.sort.factor=20 |
Decreasing these values will help to allocate smaller memory for sort tasks, which means that the tasks overall might take longer to finish (maybe 20% longer).
2) Alternatively, lowering the 'mapreduce.map.sort.spill.percent' property to something like 0.7 or so could help. The default value is 0.8, and there may be some issue where the spill is taking too long, so it should probably start sooner rather than later. Overwriting this value can be done either on the workbook or Datameer level.
Comments
0 comments
Please sign in to leave a comment.