How to find time spent by mappers and reducers in Hadoop?

Question

How to find time spent by each mapper and reducer as well as time for shuffling (sorting) within the code (not in web interface) in Hadoop? How about total time by all mapper (or reducers?

Charles Menguy · Accepted Answer

There is an API for the JobTracker as described here which gives you a bunch of information on the cluster itself as well as details for all jobs.

In particular, if you know the job id and you want to find metrics for each individual map and reduce tasks, you could call getMapTaskReports which will return a TaskReport instance detailed here which gives you access to methods such as getFinishTime or getStartTime. So for example:

TaskReport[] maps = jobtracker.getMapTaskReports("your_job_id");
for (TaskReport rpt : maps) {
  long duration = rpt.getFinishTime() - rpt.getStartTime();
  System.out.println("Mapper duration: " + duration);
}
TaskReport[] reduces = jobtracker.getReduceTaskReports("your_job_id");
for (TaskReport rpt : reduces) {
  long duration = rpt.getFinishTime() - rpt.getStartTime();
  System.out.println("Reducer duration: " + duration);
}

To count the total time by all mappers or reducers in your job, you could just sum them up simply in the code.

And regarding the shuffling, this is generally counted in the jobtracker as 33% of each reduce task, which does not necessarily mean it's 33% of the time but I don't think there's an automated way to get the shuffling time per task so you could just go with this simple heuristic with 33%.

Please take into account though that by using time measurements from the jobtracker API like shown above, the time in reducers might be a bit biased, because when a reduce task starts it essentially does the shuffling (up to 33% as explained), then it waits until all map tasks are finished, and only then does it start the actual reduce, so a reduce measurement is actually the sum of these 3 periods (shuffle + wait + reduce).

How to find time spent by mappers and reducers in Hadoop?

Tags:

java

hadoop

mapreduce

HHH

1 Answers

Charles Menguy

Recent Activity

Donate For Us

How to find time spent by mappers and reducers in Hadoop?

Tags:

java

hadoop

mapreduce

HHH

1 Answers

Charles Menguy

Related questions

Recent Activity

Donate For Us