Suppose I have an enumeration:
public enum SomeEnumType implements Writable {
A(0), B(1);
private int value;
private SomeEnumType(int value) {
this.value = value;
}
@Override
public void write(final DataOutput dataOutput) throws IOException {
dataOutput.writeInt(this.value);
}
@Override
public void readFields(final DataInput dataInput) throws IOException {
this.value = dataInput.readInt();
}
}
I want to pass an instance of it as a part of some other class instance.
The equals would not work, because it will not consider the inner variable of enumeration, plus all enum instances are fixed at compile time and could not be created elsewhere.
Does it mean I could not send enums over the wire in Hadoop or there's a solution?
Writable is a strong interface in Hadoop which while serializing the data, reduces the data size enormously, so that data can be exchanged easily within the networks. It has separate read and write fields to read data from network and write data into local disk respectively.
Implementing Writable requires implementing two methods, readFields(DataInput in) and write(DataOutput out) . Writables that are used as keys in MapReduce jobs must also implement Comparable (or simply WritableComparable).
Interface WritableA serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput . Any key or value type in the Hadoop Map-Reduce framework implements this interface.
A Writable which is also Comparable . WritableComparable s can be compared to each other, typically via Comparator s. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface. Note that hashCode() is frequently used in Hadoop to partition keys.
My normal and preferred solution for enums in Hadoop is serializing the enums through their ordinal value.
public class EnumWritable implements Writable {
static enum EnumName {
ENUM_1, ENUM_2, ENUM_3
}
private int enumOrdinal;
// never forget your default constructor in Hadoop Writables
public EnumWritable() {
}
public EnumWritable(Enum<?> arbitraryEnum) {
this.enumOrdinal = arbitraryEnum.ordinal();
}
public int getEnumOrdinal() {
return enumOrdinal;
}
@Override
public void readFields(DataInput in) throws IOException {
enumOrdinal = in.readInt();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(enumOrdinal);
}
public static void main(String[] args) {
// use it like this:
EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
// let Hadoop do the write and read stuff
EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
}
}
Obviously it has drawbacks: Ordinals can change, so if you exchange ENUM_2
with ENUM_3
and read a previously serialized file, this will return the other wrong enum.
So if you know the enum class beforehand, you can write the name of your enum and use it like this:
enumInstance = EnumName.valueOf(in.readUTF());
This will use slightly more space, but it is more save to changes to your enum names.
The full example would look like this:
public class EnumWritable implements Writable {
static enum EnumName {
ENUM_1, ENUM_2, ENUM_3
}
private EnumName enumInstance;
// never forget your default constructor in Hadoop Writables
public EnumWritable() {
}
public EnumWritable(EnumName e) {
this.enumInstance = e;
}
public EnumName getEnum() {
return enumInstance;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(enumInstance.name());
}
@Override
public void readFields(DataInput in) throws IOException {
enumInstance = EnumName.valueOf(in.readUTF());
}
public static void main(String[] args) {
// use it like this:
EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
// let Hadoop do the write and read stuff
EnumName yourDeserializedEnum = enumWritable.getEnum();
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With