Problem
A Datameer job executed under SMALL_JOB framework ends up with the status COMPLETED_WITH_WARNINGS and produces no output.
WORKBOOK_PRODUCED_RECORD_COUNT: 0
WORKBOOK_DROPPED_RECORDS: 0
WORKBOOK_CONSUMED_RECORD_COUNT: 15363960
WORKBOOK_CONSUMED_BYTES: 194466201
WORKBOOK_PRODUCED_BYTES: 0
error-00*.log indicates below messages
!stack:java.lang.RuntimeException: /user/datameer/workbooks/111/12345/test_workbook/data/map-part-00000.parquet for client 192.10.0.21 already exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2782)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2674)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2559)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:592)
This might also impact any child artifacts as they will complain to the input file. For example:
Error: Failure while running task:java.lang.RuntimeException: hdfs://nameservice1/etl/datameer/workbooks/257/727570/account_to_client_now_tv_only/data/map-part-00000.parquet is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [-12, -101, 6, 0
or
Error: Failure while running task:java.lang.IllegalArgumentException: Length must be greater than 0
Cause
A race condition with the parallel execution used inside the job when a SmallJob runner tries to execute its internal tasks concurrently.
Datameer engineering team is working to improve this behaviour.
Solution
You can workaround the issue using one of the approaches below.
- Disable concurrent job execution for SmallJob runner via the property
das.job.concurrent-mr-jobs.new-graph-api=0
- Set the execution engine for the impacted workbook to Tez via the property
das.execution-framework=Tez
Comments
0 comments
Please sign in to leave a comment.