I was working with ArrayWritable
, at some point I needed to check how Hadoop serializes the ArrayWritable
, this is what I got by setting job.setNumReduceTasks(0)
:
0 IntArrayWritable@10f11b8
3 IntArrayWritable@544ec1
6 IntArrayWritable@fe748f
8 IntArrayWritable@1968e23
11 IntArrayWritable@14da8f4
14 IntArrayWritable@18f6235
and this is the test mapper that I was using:
public static class MyMapper extends Mapper<LongWritable, Text, LongWritable, IntArrayWritable> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
int red = Integer.parseInt(value.toString());
IntWritable[] a = new IntWritable[100];
for (int i =0;i<a.length;i++){
a[i] = new IntWritable(red+i);
}
IntArrayWritable aw = new IntArrayWritable();
aw.set(a);
context.write(key, aw);
}
}
IntArrayWritable
is taken from the example given in the javadoc: ArrayWritable.
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.IntWritable;
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
}
I actually checked on the source code of Hadoop and this makes no sense to me.
ArrayWritable
should not serialize the class name and there is no way that an array of 100 IntWritable
can be serialized using 6/7 hexadecimal values. The application actually seems to work just fine and the reducer deserializes the right values...
What is happening? What am I missing?
Java - Serialization. Java provides a mechanism, called object serialization where an object can be represented as a sequence of bytes that includes the object's data as well as information about the object's type and the types of data stored in the object. After a serialized object has been written into a file, it can be read from the file and ...
This is necessary given the way that serialization works, but it is not obvious behaviour. This, and other things, means that using serialization can be non-intuitive, and very hard to debug. Although .NET provides a number of quick and easy ways to serialize and deserialize data, do not use them.
After a serialized object has been written into a file, it can be read from the file and deserialized that is, the type information and bytes that represent the object and its data can be used to recreate the object in memory.
There are a number of reasons why you should not opt for the simple approach. Here are nine important ones. 1. It forces you to design your classes a certain way XML serialization only works on public methods and fields, and on classes with public constructors. That means your classes need to be accessible to the outside world.
You have to override the default toString()
method.
It's called by the TextOutputFormat
to create a human readable format.
Try out the following code and see the result:
public class IntArrayWritable extends ArrayWritable {
public IntArrayWritable() {
super(IntWritable.class);
}
@Override
public String toString() {
StringBuilder sb = new StringBuilder();
for (String s : super.toStrings())
{
sb.append(s).append(" ");
}
return sb.toString();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With