Problem
- Datameer is running on EMR with S3 as storage.
- A Workbook fails with the following exception.
java.io.FileNotFoundException: No such file or directory: s3://<bucket name>/<Datameer private folder>/temp/job-xxxx/.staging-xxxxx-xxxxx-xxxxxxxxx/.tez/application_xxxxxxxxx_xxx/tez-dag.pb1
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2310)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2204)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2143)
at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:888)
at org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:676)
at org.apache.tez.client.TezClient.submitDAG(TezClient.java:588)
Cause
- A Tez bug (TEZ-4448) reveals itself only when S3 is being used as storage for the job's metadata.
- Datameer transforms a Workbook's logic into a set of DAGs. The first Tez DAG is being written into Datameer's temp folder in S3 at
s3://.../temp/job-xxxxxx/.staging-xxxxxx/.tez/application_xxxxxxx_xxxxxxxxxxx/
. - If the DAG is large enough, Tez code will attempt to read it before the write operation is finalized. This is where the
FileNotFoundException
comes from.
Solution
- The
FileNotFoundException
in this case, is usually accompanied by a note that a DAG is too large to be sent via IPC.Send dag plan using YARN local resources since it's too large, dag plan size=xxxxxxxx, max dag plan size through IPC=xxxxxxxx, max IPC message size= 67108864
- By default
ipc.maximum.data.length=67108864
. The max dag plan size that could be transmitted through IPC is slightly smaller than this value. - To workaround the issue, one could increase the
ipc.maximum.data.length
. - We would recommend starting from
dag plan size
+ 20%.
Comments
0 comments
Please sign in to leave a comment.