Problem
Troubleshooting and optimizing the workbook execution.
Learn
Comparing different execution frameworks by running under MapReduce rather than Tez which is the default.
After forcing the execution under another framework by adding
das.execution-framework=MapReduce
to the workbook's properties, the workbook's execution immediately fails with an error.
Error Message
ERROR [<timestamp>] [ConcurrentJobExecutor-0] (ClusterSession.java:198) - Failed to run cluster job 'Workbook job (<jobID>): <Workbook>#<Worksheet>(Filter by =!ISNULL(#<column>) && CONTAINS(#...' [2 sec] java.lang.RuntimeException: File does not exist: hdfs://nameservice1:8020/<distribution-specific-path>/hadoop/mapreduce.tar.gz at datameer.dap.sdk.util.ExceptionUtil.convertToRuntimeException(ExceptionUtil.java:49) ...
Background
An out-of-box installation of Datameer comes with some pre-defined properties, e.g., for
mapreduce.application.framework.path
which is a decent default.
Solution
The default setting
mapreduce.application.framework.path=/<distribution-specific-path>/mapreduce/mapreduce.tar.gz#yarn
may not work in every case since it will depend on the way that cluster was installed as well as the configuration.
To enable MapReduce jobs, set the path as available within the cluster
mapreduce.application.framework.path=/<cluster-specific-path>/mapreduce/mapreduce.tar.gz#yarn
Once the correct path to the mapreduce library is set, it is possible to run MapReduce jobs.
Comments
0 comments
Please sign in to leave a comment.