- Datameer is running on EMR with S3 as storage.
- A Workbook fails with the following exception.
java.io.FileNotFoundException: No such file or directory: s3://<bucket name>/<Datameer private folder>/temp/job-xxxx/.staging-xxxxx-xxxxx-xxxxxxxxx/.tez/application_xxxxxxxxx_xxx/tez-dag.pb1
- A Tez bug (TEZ-4448) reveals itself only when S3 is being used as storage for the job's metadata.
- Datameer transforms a Workbook's logic into a set of DAGs. The first Tez DAG is being written into Datameer's temp folder in S3 at
- If the DAG is large enough, Tez code will attempt to read it before the write operation is finalized. This is where the
FileNotFoundExceptionin this case, is usually accompanied by a note that a DAG is too large to be sent via IPC.
Send dag plan using YARN local resources since it's too large, dag plan size=xxxxxxxx, max dag plan size through IPC=xxxxxxxx, max IPC message size= 67108864
- By default
ipc.maximum.data.length=67108864. The max dag plan size that could be transmitted through IPC is slightly smaller than this value.
- To workaround the issue, one could increase the
- We would recommend starting from
dag plan size+ 20%.