I receive the following error:
Task attempt_201304161625_0028_m_000000_0 failed to report status for 600 seconds. Killing!
for my Map jobs. This question is similar to this, this, and this. However, I do not want to increase the default time before hadoop kills a task that doesn't report progress, i.e.,
Configuration conf=new Configuration();
long milliSeconds = 1000*60*60;
conf.setLong("mapred.task.timeout", milliSeconds);
Instead, I want to periodically report progress using either context.progress()
, context.setStatus("Some Message")
or context.getCounter(SOME_ENUM.PROGRESS).increment(1)
or something similar. However, this still causes the job to be killed. Here are the snippets of code where I am attempting to report progress. The mapper:
protected void map(Key key, Value value, Context context) throws IOException, InterruptedException {
//do some things
Optimiser optimiser = new Optimiser();
optimiser.optimiseFurther(<some parameters>, context);
//more things
context.write(newKey, newValue);
}
the optimiseFurther method within the Optimiser class:
public void optimiseFurther(<Some parameters>, TaskAttemptContext context) {
int count = 0;
while(something is true) {
//optimise
//try to report progress
context.setStatus("Progressing:" + count);
System.out.println("Optimise Progress:" + context.getStatus());
context.progress();
count++;
}
}
The output from a mapper shows the status is being updated:
Optimise Progress:Progressing:0
Optimise Progress:Progressing:1
Optimise Progress:Progressing:2
...
However, the job is still being killed after the default amount of time. Am I using the context in the wrong way? Is there anything else I need to do in the job setup in order to report the progress successfully?
This problem is to do with a bug in Hadoop 0.20 whereby calls to context.setStatus()
and context.progress()
are not being reported to the underlying framework (calls to set various counters are not working either). There is a patch available, so updating to a newer version of Hadoop should fix this.
What may be happening is that you have to call those progress methods on Reporter itself which is found within Context and may not be able to call it on context itself.
From Cloudera
Report progress
If your task reports no progress for 10 minutes (see the mapred.task.timeout property) then it will be killed by Hadoop. Most tasks don’t encounter this situation since they report progress implicitly by reading input and writing output. However, some jobs which don’t process records in this way may fall foul of this behavior and have their tasks killed. Simulations are a good example, since they do a lot of CPU-intensive processing in each map and typically only write the result at the end of the computation. They should be written in such a way as to report progress on a regular basis (more frequently than every 10 minutes). This may be achieved in a number of ways:
Call setStatus() on Reporter to set a human-readable description of
the task’s progress
Call incrCounter() on Reporter to increment a user counter
Call progress() on Reporter to tell Hadoop that your task is
still there (and making progress)
Cloudera Tips
public Context(Configuration conf, TaskAttemptID taskid,
RecordReader<KEYIN,VALUEIN> reader,
RecordWriter<KEYOUT,VALUEOUT> writer,
OutputCommitter committer,
StatusReporter reporter,
InputSplit split)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With