What are the advantages of using NullWritable
for null
keys/values over using null
texts (i.e. new Text(null)
). I see the following from the «Hadoop: The Definitive Guide» book.
NullWritable
is a special type ofWritable
, as it has a zero-length serialization. No bytes are written to, or read from, the stream. It is used as a placeholder; for example, in MapReduce, a key or a value can be declared as aNullWritable
when you don’t need to use that position—it effectively stores a constant empty value. NullWritable can also be useful as a key inSequenceFile
when you want to store a list of values, as opposed to key-value pairs. It is an immutable singleton: the instance can be retrieved by callingNullWritable.get()
I do not clearly understand how the output is written out using NullWritable
? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null
, so that the MapReduce framework can ignore reading the null
keys/values (whichever is null
)? Also, how actually are null
texts serialized?
Thanks,
Venkat
NullWritable is a special type of Writable , as it has a zero-length serialization. No bytes are written to, or read from, the stream.
Writable data types are meant for writing the data to the local disk and it is a serialization format. Just like in Java there are data types to store variables (int, float, long, double,etc.), Hadoop has its own equivalent data types called Writable data types.
Hadoop needs to be able to serialise data in and out of Java types via DataInput and DataOutputobjects (IO Streams usually). The Writable classes do this by implementing two methods `write(DataOuput) and readFields(DataInput). Specifically LongWritable is a Writable class that wraps a java long.
Context object: allows the Mapper/Reducer to interact with the rest of the Hadoop system. It includes configuration data for the job as well as interfaces which allow it to emit output. Applications can use the Context: to report progress. to set application-level status messages.
The key/value types must be given at runtime, so anything writing or reading NullWritables
will know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritables
are "read", it's just that "reading" a NullWritable
is actually a no-op. You can see for yourself that there's nothing at all written or read:
NullWritable nw = NullWritable.get(); ByteArrayOutputStream out = new ByteArrayOutputStream(); nw.write(new DataOutputStream(out)); System.out.println(Arrays.toString(out.toByteArray())); // prints "[]" ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]); nw.readFields(new DataInputStream(in)); // works just fine
And as for your question about new Text(null)
, again, you can try it out:
Text text = new Text((String)null); ByteArrayOutputStream out = new ByteArrayOutputStream(); text.write(new DataOutputStream(out)); // throws NullPointerException System.out.println(Arrays.toString(out.toByteArray()));
Text
will not work at all with a null
String
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With