Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop streaming jobs SUCCEEDED but killed by ApplicationMaster

Tags:

python

hadoop


I just finished setting up a small hadoop cluster (using 3 ubuntu machines and apache hadoop 2.2.0) and am now trying to run python streaming jobs.

Running a test job I encounter the following problem:
Almost all map tasks are marked as successful but with note saying Container killed.

On the online interface the log for the map jobs says:
Progress 100.00
State SUCCEEDED

but under Note it says for almost every attempt (~200)
Container killed by the ApplicationMaster.
or
Container killed by the ApplicationMaster. Container killed on request. Exit code is 143

In the log file associated with the attempt I can see a log saying Task 'attempt_xxxxxxxxx_0' done.

I also get 3 attempts with the same log, only those 3 have
State KILLED
which are under killed jobs.

stderr output is empty for all jobs/attempts.

When looking at the application master log and following one of the successful (but killed) attempts I find the following logs:

  • Transitioned from NEW to UNASSIGNED
  • Transitioned from UNASSIGNED to ASSIGNED
  • several progress updates, including: 1.0
  • Done acknowledgement
  • RUNNING to SUCCESS_CONTAINER_CLEANUP
  • CONTAINER_REMOTE_CLEANUP
  • KILLING attempt_xxxx
  • Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
  • Task Transitioned from RUNNING to SUCCEEDED

All the attempts are numbered xxxx_0 so I assume they are not killed as a result of speculative execution.

Should I be worried about this? And what causes the containers to be killed? Any suggestions would be greatly appreciated!

like image 350
GebitsGerbils Avatar asked Jun 02 '14 11:06

GebitsGerbils


1 Answers

Yes, I agree with @joshua. It seems to be a bug related to a task/container not dying gracefully after successfully finishing the map/reduce task. After the grace period, the ApplicationMaster has to kill it instead.

I am running 'yarn version'= Hadoop 2.5.0-cdh5.3.0

I picked one of the tasks and grep'ed for its history in the log generated for my MR application:

$ yarn logs -applicationId application_1422894000163_0003 |grep attempt_1422894000163_0003_r_000008_0

You will see that "attempt_1422894000163_0003_r_000008_0" goes through the "TaskAttempt Transitioned from NEW to UNASSIGNED .. to RUNNING to SUCCESS_CONTAINER_CLEANUP'.

In the step 'SUCCESS_CONTAINER_CLEANUP', you will see messages about this container being killed. After this container is killed, this attempt goes into the "TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED" step.

like image 144
rr9031 Avatar answered Oct 22 '22 00:10

rr9031