Problem
One or more Datameer jobs fail in an environment connected to a Hadoop cluster. The following information is logged in the job.log as a stacktrace:
ERROR [2015-03-19 18:29:22.029] [JobScheduler thread-1] (JobScheduler.java:800) - Job 2081 failed with exception. java.lang.RuntimeException: Failed to run cluster job for 'Workbook job (2081): Preroll_MSNBC_Master#Joined(Disconnected record stream)#Joined(Disconnected rec' at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:197) at datameer.dap.common.graphv2.mixedframework.MixedClusterSession.execute(MixedClusterSession.java:48) at datameer.dap.common.graphv2.ClusterSession.runAllClusterJobs(ClusterSession.java:364) at datameer.dap.common.graphv2.MrPlanRunnerV2.run(MrPlanRunnerV2.java:87) at java.lang.Thread.run(Thread.java:745) Caused by: datameer.com.google.common.base.VerifyException: Finished DAG 'Workbook job (2081): Preroll_MSNBC_Master#Joined(Disconnected record stream)#Joined(Disconnected rec (a8a47b\ a0-5f97-4913-a0fd-b93b549ab40d)' (application_1423440310529_2105) with state FAILED and diagnostics: [Vertex re-running, vertexName=MapChainVertex:Workbook job (2081): Preroll_MSNBC_\ Master#Joined(Disconnected record stream) (e1f425c2-0ff4-4c44-9776-2c51263eb19b), vertexId=vertex_1423440310529_2105_2_01, Vertex failed, vertexName=ReduceChainVertex:Workbook job (2\ 081): Preroll_MSNBC_Master#Joined(Disconnected record stream) (6f0e97ac-6350-43e9-86d4-73ea63825540), vertexId=vertex_1423440310529_2105_2_02, diagnostics=[Task failed, taskId=task_1\ 423440310529_2105_2_02_000032, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IllegalArgumentException: n must be positive at java.util.Random.nextInt(Random.java:300) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:305) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:344) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createTmpFileForWrite(LocalDirAllocator.java:412) at org.apache.hadoop.fs.LocalDirAllocator.createTmpFileForWrite(LocalDirAllocator.java:198) at datameer.dap.sdk.util.HadoopTmpDirAllocator.allocateFile(HadoopTmpDirAllocator.java:40) at datameer.dap.sdk.util.HadoopUtil.getTaskTmpFile(HadoopUtil.java:254) at datameer.dap.common.graphv2.ProcessingContext.getJoinValueCache(ProcessingContext.java:417) at datameer.dap.common.job.sheets.join.reduceside.ReduceSideJoinCombiner.combine(ReduceSideJoinCombiner.java:153) at datameer.dap.common.graphv2.OperationChain.connect(OperationChain.java:63) at datameer.plugin.tez.processing.AggregationVertexRecordProcessor.run(AggregationVertexRecordProcessor.java:158) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:180) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:167) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.IllegalArgumentException: n must be positive ...
Of note, the error message "n must be positive" within the context of the "org.apache.hadoop.fs.LocalDirAllocator" class is the distinguishing component of this particular error.
Cause
This is a configuration issue with the Hadoop data nodes. In particular, the configured value for the parameter named "mapred.local.dir" (Hadoop 1) or "mapreduce.cluster.local.dir" (Hadoop 2) does not point to a write enabled file system.
Solution
Check on the current value of the "mapred.local.dir" (Hadoop 1) or "mapreduce.cluster.local.dir" (Hadoop 2) parameter. This value should point to a file system on the Hadoop Data Nodes to store local spill data.
Ensure that either:
- The datameer account (Linux user that started the Datameer process) can write to the directory/directories configured by the above parameter, or
- Make the directory/directories world writable
Comments
0 comments
Please sign in to leave a comment.