Problem
A Datameer job fails with the following error message:
WARN [2015-12-21 17:11:43.717] [ConcurrentJobExecutor-2] (DefaultMrJobClient.java:211) - Task Id : attempt_1450719794020_0272_r_000123_2, Status : FAILED
WARN [2015-12-21 17:11:43.718] [ConcurrentJobExecutor-2] (DefaultMrJobClient.java:216) - attempt_1450719794020_0272_r_000123_2: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in MapOutputCopier task_1450719794020_0272_r_000123.8
at org.apache.hadoop.mapreduce.task.reduce.DirectShuffle.run(DirectShuffle.java:128)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.IOException: 463 failures downloading attempt_1450719794020_0272_m_003576_0
at org.apache.hadoop.mapreduce.task.reduce.DirectShuffleSchedulerImpl.copyFailed(DirectShuffleSchedulerImpl.java:301)
at org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher.copyOutput(DirectShuffleFetcher.java:252)
at org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher.run(DirectShuffleFetcher.java:183)
Troubleshooting
Identify an Application task ID and retrieve the task logs from Hadoop. Contained within, you might find the following error text:
MAX_FAILED_UNIQUE_FETCHES
Cause
When an application is being processed by Hadoop, there are two main phases: map and reduce. During the map phase, intermediate files are stored in a temporary directory. Once the reduce phase starts, reduce tasks attempt to gather those intermediate files and put them together for the final output.
The above errors are displayed when the reduce task is unable to download the intermediate files created by the map task.
Solution
All research suggests that this is due to network communication issues within the cluster. Engage the support team for which ever Hadoop distribution is experiencing this error.
Comments
0 comments
Please sign in to leave a comment.