To be more specific:
MapReduce Job counter measures the job-level statistics, not values that change while a task is running. For example, TOTAL_LAUNCHED_MAPS, count the number of map tasks that were launched over the course of a job (including tasks that failed).
Use compression when you are writing intermediate data to disk. Tune number of Map & Reduce tasks as per above tips. Incorporate Combiner wherever it is appropriate. Use Most appropriate data types for rendering Output ( Do not use LongWritable when range of output values are in Integer range.
Counters in Hadoop are used to keep track of occurrences of events. In Hadoop, whenever any job gets executed, Hadoop Framework initiates Counter to keep track of job statistics like the number of bytes read, the number of rows read, the number of rows written etc.
'CPU_MILLISECONDS' counter can give you info about - Total time spent by all tasks on CPU.
'REDUCE_SHUFFLE_BYTES' higher the number , higher the n/w utilization. (lot more opts availble like this)
There are 4 categories of counters in Hadoop: file system, job, framework, and custom.
You can use the built-in counters to validate that:
1.The correct number of bytes was read and written
2.The correct number of tasks was launched and successfully ran
3.The amount of CPU and memory consumed is appropriate for your job and cluster nodes
4.The correct number of records was read and written
more info avalible @ https://www.mapr.com/blog/managing-monitoring-and-testing-mapreduce-jobs-how-work-counters#.VZy9IF_vPZ4 (**credits- mapr.com)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With