Error Executing TEZ Session - SequenceFile doesn't work with GzipCodec without native-hadoop code – Datameer

Problem

When executing a job using the TEZ execution framework, the following error is observed in the job log:

java.lang.IllegalArgumentException: SequenceFile doesn't work with GzipCodec without native-hadoop code!
	at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1087)
	at org.apache.hadoop.io.SequenceFile$BlockCompressWriter.<init>(SequenceFile.java:1441)
	at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:274)
	at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:398)
	at datameer.dap.common.job.mr.input.DasFileOutputFormat.createSeqFileWriter(DasFileOutputFormat.java:86)
	at datameer.dap.common.job.mr.input.DasFileOutputFormat.createFileWriter(DasFileOutputFormat.java:69)
	at datameer.dap.common.graphv2.hadoop.TaskSideRecordWriter.writeAll(TaskSideRecordWriter.java:63)
	at datameer.dap.common.graphv2.ProcessingContext.writeTo(ProcessingContext.java:124)
	at datameer.plugin.tez.input.TezSplitGenerator.consumeSplitMetaInformation(TezSplitGenerator.java:125)
	at datameer.plugin.tez.input.TezSplitGenerator.initialize(TezSplitGenerator.java:88)
	at datameer.plugin.tez.input.TezSplitGenerator.initializeEvents(TezSplitGenerator.java:63)
	at datameer.plugin.tez.input.AbstractDatameerInputInitializer.initialize(AbstractDatameerInputInitializer.java:31)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:214)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:208)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:208)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:195)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)

The TEZ execution framework causes this job to fail. However, other execution frameworks, such as Hadoop, still function properly.

Cause

This is a configuration issue. The TEZ variables in the environment do not contain pointers to the native Hadoop code. Here are the parameters to investigate:

tez.am.launch.env
tez.task.launch.env

Solution

To work around this issue, execute the job using another framework by adding this Custom Hadoop Property to the job:

das.execution-framework=Hadoop

To resolve this issue, update the TEZ environment variables to ensure that they point to a path that includes the native Hadoop code (including the hadoop-common-*.jar file).

Here is an example of setting these parameters in an environment where the HADOOP_COMMON_HOME environmental variable is set to /usr/hadoop.

tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/:$HADOOP_COMMON_HOME/lib:$HADOOP_COMMON_HOME/lib/native/Linux-amd64-64
tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_COMMON_HOME/lib/native/:$HADOOP_COMMON_HOME/lib:$HADOOP_COMMON_HOME/lib/native/Linux-amd64-64

Of note, if the $HADOOP_COMMON_HOME variable is unset, the path may be hard coded instead. Here is an example of that, assuming that the correct path on the Hadoop nodes to the Hadoop libraries is /usr/hadoop/lib, /usr/hadoop/lib/native and /usr/hadoop/lib/native/Linux-amd64-64.

tez.am.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hadoop/lib/native/:/usr/hadoop/lib:/usr/hadoop/lib/native/Linux-amd64-64
tez.task.launch.env=LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/hadoop/lib/native/:/usr/hadoop/lib:/usr/hadoop/lib/native/Linux-amd64-64

Articles in this section

Error Executing TEZ Session - SequenceFile doesn't work with GzipCodec without native-hadoop code

Problem

Cause

Solution

Comments

Articles in this section

Problem

Cause

Solution

Related articles