Understanding LongWritable

Question

I'm sorry if this is a foolish question, but I couldn't find answer with a Google search. How I can understand LongWritable type? What is it? Can anybody link to a schema or other helpful page.

Gareth Davis · Accepted Answer

Hadoop needs to be able to serialise data in and out of Java types via DataInput and DataOutput objects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput).

Specifically LongWritable is a Writable class that wraps a java long.

Most of the time (especially just starting out) you can mentally replace LongWritable -> Long i.e. it's just a number. If you get to defining your own datatypes you will start to become every familiar with implementing the writable interface:

Which looks some thing like:

public interface Writable {

       public void write(DataOutput out) throws IOException;

       public void readFields(DataInput in) throws IOException;
}

Ravindra babu · Answer

From Apache documentation page,

Writable is described as :

serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.

LongWritable is A WritableComparable for longs.

Need for Writables:

In Hadoop, interprocess communication was built with remote procedure calls ( RPC). The RPC protocol uses serialization to render the message into a binary stream at sender and it will be deserialized into the original message from binary stream at receiver.

Java Serialization has many disadvantages with respect to performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.

Refer to these two posts:

dzone article

https://softwareengineering.stackexchange.com/questions/191269/java-serialization-advantages-and-disadvantages-use-or-avoid

For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable. Due to this reason, Hadoop framework has come up with own IO classes to replace java primitive data types. e.g. IntWritbale for int, LongWritable for long, Text for String etc.

You can get more details if you refer to "Hadoop the definitive guide" fourth edition.

Understanding LongWritable

Tags:

java

hadoop

mapreduce

Mijatovic

2 Answers

Gareth Davis

Ravindra babu

Recent Activity

Donate For Us

Understanding LongWritable

Tags:

java

hadoop

mapreduce

Mijatovic

2 Answers

Gareth Davis

Ravindra babu

Related questions

Recent Activity

Donate For Us