Problem
When executing a job using the TEZ execution framework, the following error is observed in the job log:
java.lang.IllegalArgumentException: SequenceFile doesn't work with GzipCodec without native-hadoop code! at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1087) at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1441) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:274) at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:398) at datameer.dap.common.job.mr.input.DasFileOutputFormat.createSeqFileWriter(DasFileOutputFormat.java:86) at datameer.dap.common.job.mr.input.DasFileOutputFormat.createFileWriter(DasFileOutputFormat.java:69) at datameer.dap.common.graphv2.hadoop.TaskSideRecordWriter.writeAll(TaskSideRecordWriter.java:63) at datameer.dap.common.graphv2.ProcessingContext.writeTo(ProcessingContext.java:124) at datameer.plugin.tez.input.TezSplitGenerator.consumeSplitMetaInformation(TezSplitGenerator.java:125) at datameer.plugin.tez.input.TezSplitGenerator.initialize(TezSplitGenerator.java:88) at datameer.plugin.tez.input.TezSplitGenerator.initializeEvents(TezSplitGenerator.java:63) at datameer.plugin.tez.input.AbstractDatameerInputInitializer.initialize(AbstractDatameerInputInitializer.java:31) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:214) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:208) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:208) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:195) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)
The TEZ execution framework causes this job to fail. However, other execution frameworks, such as Hadoop, still function properly.
Cause
This is a configuration issue. The TEZ variables in the environment do not contain pointers to the native Hadoop code. Here are the parameters to investigate:
-
tez.am.launch.env
-
tez.task.launch.env
Solution
To work around this issue, execute the job using another framework by adding this Custom Hadoop Property to the job:
- das.execution-framework=Hadoop
To resolve this issue, update the TEZ environment variables to ensure that they point to a path that includes the native Hadoop code (including the hadoop-common-*.jar file).
Here is an example of setting these parameters in an environment where the HADOOP_COMMON_HOME environmental variable is set to /usr/hadoop.
tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/:$HADOOP_COMMON_HOME/lib:$HADOOP_COMMON_HOME/lib/native/Linux-amd64-64 tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/:$HADOOP_COMMON_HOME/lib:$HADOOP_COMMON_HOME/lib/native/Linux-amd64-64
Of note, if the $HADOOP_COMMON_HOME variable is unset, the path may be hard coded instead. Here is an example of that, assuming that the correct path on the Hadoop nodes to the Hadoop libraries is /usr/hadoop/lib, /usr/hadoop/lib/native and /usr/hadoop/lib/native/Linux-amd64-64.
tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hadoop/lib/native/:/usr/hadoop/lib:/usr/hadoop/lib/native/Linux-amd64-64 tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hadoop/lib/native/:/usr/hadoop/lib:/usr/hadoop/lib/native/Linux-amd64-64
Comments
0 comments
Please sign in to leave a comment.