Error: java.io.IOException: 463 failures downloading attempt_<attempt_id>

Problem

A Datameer job fails with the following error message:

WARN [2015-12-21 17:11:43.717] [ConcurrentJobExecutor-2] (DefaultMrJobClient.java:211) - Task Id : attempt_1450719794020_0272_r_000123_2, Status : FAILED
 WARN [2015-12-21 17:11:43.718] [ConcurrentJobExecutor-2] (DefaultMrJobClient.java:216) - attempt_1450719794020_0272_r_000123_2: Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in MapOutputCopier task_1450719794020_0272_r_000123.8
    at org.apache.hadoop.mapreduce.task.reduce.DirectShuffle.run(DirectShuffle.java:128)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Caused by: java.io.IOException: 463 failures downloading attempt_1450719794020_0272_m_003576_0
    at org.apache.hadoop.mapreduce.task.reduce.DirectShuffleSchedulerImpl.copyFailed(DirectShuffleSchedulerImpl.java:301)
    at org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher.copyOutput(DirectShuffleFetcher.java:252)
    at org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher.run(DirectShuffleFetcher.java:183)

Troubleshooting

Identify an Application task ID and retrieve the task logs from Hadoop. Contained within, you might find the following error text:

MAX_FAILED_UNIQUE_FETCHES

Cause

When an application is being processed by Hadoop, there are two main phases: map and reduce. During the map phase, intermediate files are stored in a temporary directory. Once the reduce phase starts, reduce tasks attempt to gather those intermediate files and put them together for the final output.

The above errors are displayed when the reduce task is unable to download the intermediate files created by the map task.

Solution

All research suggests that this is due to network communication issues within the cluster. Engage the support team for which ever Hadoop distribution is experiencing this error.

Articles in this section

Problem

Troubleshooting

Cause

Solution

Comments

Articles in this section

Problem

Troubleshooting

Cause

Solution

Related articles