When i do a mapreduce program,i encounter that the key is a tuple (A,B) (A and B are both integer sets).How can i custom this data type?
public static class MapClass extends Mapper<Object,Text,Tuple,Tuple>....
public class Tuple implements WritableComparable<Tuple>{
@Override
public void readFields(DataInput arg0) throws IOException {
// TODO Auto-generated method stub
}
@Override
public void write(DataOutput arg0) throws IOException {
// TODO Auto-generated method stub
}
@Override
public int compareTo(Tuple o) {
// TODO Auto-generated method stub
return 0;
}
}
You're almost there, just add variables for A and B, and then complete the serialization methods and compareTo:
public class Tuple implements WritableComparable<Tuple>{
public Set<Integer> a = new TreeSet<Integer>;
public Set<Integer> b = new TreeSet<Integer>;
@Override
public void readFields(DataInput arg0) throws IOException {
a.clear();
b.clear();
int count = arg0.readInt();
while (count-- > 0) {
a.add(arg0.readInt());
}
count = arg0.readInt();
while (count-- > 0) {
b.add(arg0.readInt());
}
}
@Override
public void write(DataOutput arg0) throws IOException {
arg0.writeInt(a.size());
for (int v : a) {
arg0.writeInt(v);
}
arg0.writeInt(b.size());
for (int v : b) {
arg0.writeInt(v);
}
}
@Override
public int compareTo(Tuple o) {
// you'll need to implement how you want to compare the two sets between objects
}
}
to implement a custom datatype in hadoop, you must implement WritableComparable interface and provide the custom implementation for readFields() write() method. Apart from the implementation of readFiled and write methods must override the equals and hashcode method of java object.
In case of custom data type implementation for the keys must implement comparable interface.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With