Problem
A Datameer job fails with an error that is similar to the following stack trace.
ERROR [2015-01-01 00:00:00.000] [MrPlanRunnerV2] (ClusterSession.java:192) - Failed to run cluster job 'Workbook job (123456): MyWorkbook#MySheet(Expression record processor)#' [10 mins, 10 sec] java.lang.RuntimeException: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: 'mynode1/10.10.10.123'; destination host is: 'mynode2':33337;
at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49)
at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:31)
at datameer.plugin.tez.DagRunner.submit(DagRunner.java:96)
at datameer.plugin.tez.TezJob.runTezDag(TezJob.java:159)
at datameer.plugin.tez.TezJob.runImpl(TezJob.java:132)
at datameer.dap.common.graphv2.ClusterJob.run(ClusterJob.java:129)
at datameer.dap.common.graphv2.ClusterSession.execute(ClusterSession.java:186)
at datameer.dap.common.graphv2.mixedframework.MixedClusterSession.execute(MixedClusterSession.java:48)
at datameer.dap.common.graphv2.ClusterSession.runAllClusterJobs(ClusterSession.java:360)
at datameer.dap.common.graphv2.MrPlanRunnerV2.run(MrPlanRunnerV2.java:86)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tez.dag.api.TezException: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: 'mynode1/10.10.10.123'; destination host is: 'mynode2':33337;
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:406)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:342)
at datameer.plugin.tez.DagRunner.<init>(DagRunner.java:55)
at datameer.plugin.tez.DagRunner.<init>(DagRunner.java:40)
at datameer.plugin.tez.DagRunner$1.run(DagRunner.java:105)
at datameer.plugin.tez.DagRunner$1.run(DagRunner.java:102)
at datameer.dap.common.entity.properties.SecureGridMode.executePossiblyImpersonated(SecureGridMode.java:257)
at datameer.plugin.tez.DagRunner.getPossiblyImpersonatedDagRunner(DagRunner.java:102)
at datameer.plugin.tez.DagRunner.submit(DagRunner.java:77)
... 8 more
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: 'mynode1/10.10.10.123'; destination host is: 'mynode2':33337;
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:216)
at com.sun.proxy.$Proxy125.submitDAG(Unknown Source)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:399)
... 16 more
Caused by: java.io.IOException: Failed on local exception: java.io.EOFException; Host Details : local host is: 'mynode1/10.10.10.123'; destination host is: 'mynode2':33337;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
... 18 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1054)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:949)
Cause
This aligns with a known bug in TEZ: TEZ-494. The root cause is that the RPC request in the TEZ container is beyond the limit: ipc.maximum.data.length
is the affected cluster value.
Solution
To work around this issue, force the affected job to run in the MapReduce framework by adding this Custom Property to the affected workbook: das.execution-framework=MapReduce
.
To resolve this issue, please configure the ipc.maximum.data.length=134217728
parameter in the Custom Hadoop Properties as documented here: Smart Execution - Activation
Comments
0 comments
Please sign in to leave a comment.