I'm sorry if this is a foolish question, but I couldn't find answer with a Google search.
How I can understand LongWritable
type? What is it? Can anybody link to a schema or other helpful page.
Hadoop needs to be able to serialise data in and out of Java types via DataInput
and DataOutput
objects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput).
Specifically LongWritable
is a Writable
class that wraps a java long.
Most of the time (especially just starting out) you can mentally replace LongWritable
-> Long
i.e. it's just a number. If you get to defining your own datatypes you will start to become every familiar with implementing the writable interface:
Which looks some thing like:
public interface Writable {
public void write(DataOutput out) throws IOException;
public void readFields(DataInput in) throws IOException;
}
From Apache documentation page,
Writable
is described as :
serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput.
LongWritable
is A WritableComparable for longs.
Need for Writables:
In Hadoop, interprocess communication was built with remote procedure calls ( RPC). The RPC protocol uses serialization to render the message into a binary stream at sender and it will be deserialized into the original message from binary stream at receiver.
Java Serialization has many disadvantages with respect to performance and efficiency. Java serialization is much slower than using in memory stores and tends to significantly expand the size of the object. Java Serialization also creates a lot of garbage.
Refer to these two posts:
dzone article
https://softwareengineering.stackexchange.com/questions/191269/java-serialization-advantages-and-disadvantages-use-or-avoid
For effectiveness of Hadoop, the serialization/de-serialization process should be optimized because huge number of remote calls happen between the nodes in the cluster. So the serialization format should be fast, compact, extensible and interoperable
. Due to this reason, Hadoop framework has come up with own IO classes to replace java primitive data types. e.g. IntWritbale
for int
, LongWritable
for long
, Text
for String
etc.
You can get more details if you refer to "Hadoop the definitive guide" fourth edition.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With