Problem
When exporting to Hive, a large chunk of records gets consistently dropped. Exporting these records directly to HDFS is possible without an issue. The dropped records only appear when attempting to export to either a new or existing Hive table.
Error Message
There is not an exact error message; instead, the job completes with warnings. The log snip below details the output from a Hive export job that is dropping records.
INFO [<timestamp>] [MrPlanRunnerV2] (JobExecutionTraceService.java:106) - Copying job execution trace log from /app/datameer/Datameer-<version>-<dist>/build/cache/dfscache/local-job-execution-traces/<jobID> to maprfs:/datalake/corporate/dataliberation/datameer/exportjobs/<configID>/<jobID>/job-execution-trace.log
INFO [<timestamp>] [JobScheduler worker1-thread-253] (DapJobCounter.java:176) - Job SUCCESS with '1' mr-jobs and following counters:
INFO [<timestamp>] [JobScheduler worker1-thread-253] (DapJobCounter.java:179) - EXPORT_RECORDS: 1908
INFO [<timestamp>] [JobScheduler worker1-thread-253] (DapJobCounter.java:179) - EXPORT_DROPPED_RECORDS: 3617
Troubleshooting
Further investigation in the task logs revealed the following nondescript error messaging:
!message:NullPointerException:
!Error Repeated:>100 times
!stack:java.lang.NullPointerException
at datameer.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:210)
at datameer.com.google.common.collect.Lists.newArrayList(Lists.java:142)
at datameer.das.plugin.hive.RecordObjectInspector.coerceColumnForHive(RecordObjectInspector.java:118)
at datameer.das.plugin.hive.RecordObjectInspector.getStructFieldsDataAsList(RecordObjectInspector.java:108)
at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:396)
at datameer.das.plugin.hive.HiveFileOutputAdapter.write(HiveFileOutputAdapter.java:80)
at datameer.das.plugin.hive.HiveOutputAdapter.write(HiveOutputAdapter.java:74)
at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor$1.computeValue(ExportJob.java:115)
at datameer.dap.common.job.dapexport.ExportJob$ExportRecordProcessor$1.computeValue(ExportJob.java:108)
at datameer.dap.sdk.sequence.Sequence$Singleton.moveToNext(Sequence.java:246)
at datameer.dap.sdk.sequence.Sequence$14.computeNext(Sequence.java:647)
at datameer.dap.sdk.sequence.Sequence$Simple.moveToNext(Sequence.java:157)
...
In this instance, the cause was identified by isolating the columns in which were exported to Hive until it were determined the offending fields. Through this process of elimination following a binary sort pattern, two columns were determined to contain null values.
Solution
The formulas attached to those columns were updated to include the DENULLIFY wrapper and the NullPointerException
listed above was resolved. All records were now exported to Hive as expected.
Comments
0 comments
Please sign in to leave a comment.