Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Map output materialized bytes" vs "map output bytes"

In the hadoop job counters, what is the difference between "Map output materialized bytes" vs "map output bytes"? I don't see the former when I disable map output compression so I guess it is the real output bytes (compressed) while the latter is uncompressed bytes?

like image 968
kee Avatar asked Nov 13 '12 17:11

kee


1 Answers

I think you are right. From http://hadoop.apache.org/docs/r1.0.4/releasenotes.html:

MAPREDUCE-2365. New counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN). New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize. (Siddharth Seth)

(Changes Since Hadoop 0.20.2)

...................................................................................................................................................

Here is a quote from Tom White's "Hadoop Definitive Guide", 3rd edition (table 8-2, page 261):

"Map output materialized bytes" - The number of bytes of map output actually written to disk. If map output compression is enabled, this is reflected in the counter value.

"Map output bytes" - The number of bytes of uncompressed output produced by all the maps in the job. Incremented every time the collect() method is called on the map's OutputCollector.

like image 74
Yevgen Yampolskiy Avatar answered Nov 18 '22 07:11

Yevgen Yampolskiy