Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Hadoop need classes like Text or IntWritable instead of String or Integer?

Tags:

hadoop

Why does Hadoop need to introduce these new classes? They just seem to complicate the interface

like image 262
Casebash Avatar asked Oct 18 '13 03:10

Casebash


People also ask

Why does Hadoop need IntWritable instead of int?

Java's serialization is too bulky and slow on the system. That's why Hadoop community had put Writable in place. "WritableComparable" is a combination of the above two interfaces. "int" is a primitive type so it cannot be used as key-value.

What is the need of IntWritable?

IntWritable is the Hadoop flavour of Integer, which is optimized to provide serialization in Hadoop. Java Serialization is too big or too heavy for Hadoop, hence the box classes in Hadoop implements serialization through the Interface called Writable. Writable can serialize the object in a very light way.

What is text writable in Hadoop?

Interface WritableA serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput . Any key or value type in the Hadoop Map-Reduce framework implements this interface.

What is text in MapReduce?

Text is a replacement for the UTF8 class, which was deprecated because it didn't support strings whose encoding was over 32,767 bytes, and because it used Java's modified UTF-8. Furthermore, Text uses standard UTF-8, which makes it potentially easier to inter operate with other tools that understand UTF-8.


1 Answers

In order to handle the Objects in Hadoop way. For example, hadoop uses Text instead of java's String. The Text class in hadoop is similar to a java String, however, Text implements interfaces like Comparable, Writable and WritableComparable.

These interfaces are all necessary for MapReduce; the Comparable interface is used for comparing when the reducer sorts the keys, and Writable can write the result to the local disk. It does not use the java Serializable because java Serializable is too big or too heavy for hadoop, Writable can serializable the hadoop Object in a very light way.

like image 180
Winston Avatar answered Oct 22 '22 01:10

Winston