Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret MapReduce Performance Counters

To be more specific:

  1. In task counters, the CPU spent is from proc/stat's utime + stime, so it means things like IOWait will not be counted. Is that right?
  2. Elapsed time for the whole task are a lot longer than CPU time spent counter, does it mean the node is very busy and the container not getting CPU or waiting for IO for very long time?
  3. How can I tell if a task is CPU bound or IO count just from counter?
like image 670
user1192878 Avatar asked Jun 29 '15 11:06

user1192878


People also ask

What are counters in MapReduce?

MapReduce Job counter measures the job-level statistics, not values that change while a task is running. For example, TOTAL_LAUNCHED_MAPS, count the number of map tasks that were launched over the course of a job (including tasks that failed).

How can I improve my MapReduce performance?

Use compression when you are writing intermediate data to disk. Tune number of Map & Reduce tasks as per above tips. Incorporate Combiner wherever it is appropriate. Use Most appropriate data types for rendering Output ( Do not use LongWritable when range of output values are in Integer range.

What is the purpose of counters in Hadoop?

Counters in Hadoop are used to keep track of occurrences of events. In Hadoop, whenever any job gets executed, Hadoop Framework initiates Counter to keep track of job statistics like the number of bytes read, the number of rows read, the number of rows written etc.


1 Answers

'CPU_MILLISECONDS' counter can give you info about - Total time spent by all tasks on CPU.

'REDUCE_SHUFFLE_BYTES' higher the number , higher the n/w utilization. (lot more opts availble like this) enter image description here

There are 4 categories of counters in Hadoop: file system, job, framework, and custom.

You can use the built-in counters to validate that:

1.The correct number of bytes was read and written
2.The correct number of tasks was launched and successfully ran
3.The amount of CPU and memory consumed is appropriate for your job and cluster nodes
4.The correct number of records was read and written 

more info avalible @ https://www.mapr.com/blog/managing-monitoring-and-testing-mapreduce-jobs-how-work-counters#.VZy9IF_vPZ4 (**credits- mapr.com)

like image 102
vijay kumar Avatar answered Oct 08 '22 22:10

vijay kumar