In the hadoop job counters, what is the difference between "Map output materialized bytes" vs "map output bytes"? I don't see the former when I disable map output compression so I guess it is the real output bytes (compressed) while the latter is uncompressed bytes?
I think you are right. From http://hadoop.apache.org/docs/r1.0.4/releasenotes.html:
MAPREDUCE-2365. New counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN). New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize. (Siddharth Seth)
(Changes Since Hadoop 0.20.2)
...................................................................................................................................................
Here is a quote from Tom White's "Hadoop Definitive Guide", 3rd edition (table 8-2, page 261):
"Map output materialized bytes" - The number of bytes of map output actually written to disk. If map output compression is enabled, this is reflected in the counter value.
"Map output bytes" - The number of bytes of uncompressed output produced by all the maps in the job. Incremented every time the collect()
method is called on the map's OutputCollector
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With