Why does Hadoop need to introduce these new classes? They just seem to complicate the interface
Java's serialization is too bulky and slow on the system. That's why Hadoop community had put Writable in place. "WritableComparable" is a combination of the above two interfaces. "int" is a primitive type so it cannot be used as key-value.
IntWritable is the Hadoop flavour of Integer, which is optimized to provide serialization in Hadoop. Java Serialization is too big or too heavy for Hadoop, hence the box classes in Hadoop implements serialization through the Interface called Writable. Writable can serialize the object in a very light way.
Interface WritableA serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput . Any key or value type in the Hadoop Map-Reduce framework implements this interface.
Text is a replacement for the UTF8 class, which was deprecated because it didn't support strings whose encoding was over 32,767 bytes, and because it used Java's modified UTF-8. Furthermore, Text uses standard UTF-8, which makes it potentially easier to inter operate with other tools that understand UTF-8.
In order to handle the Objects in Hadoop way. For example, hadoop uses Text
instead of java's String
. The Text
class in hadoop is similar to a java String
, however, Text
implements interfaces like Comparable
, Writable
and WritableComparable
.
These interfaces are all necessary for MapReduce; the Comparable
interface is used for comparing when the reducer sorts the keys, and Writable
can write the result to the local disk. It does not use the java Serializable
because java Serializable
is too big or too heavy for hadoop, Writable
can serializable the hadoop Object in a very light way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With