Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enum value implementing Writable interface of Hadoop

Tags:

java

enums

hadoop

Suppose I have an enumeration:

public enum SomeEnumType implements Writable {
  A(0), B(1);

  private int value;

  private SomeEnumType(int value) {
    this.value = value;
  }

  @Override
  public void write(final DataOutput dataOutput) throws IOException {
    dataOutput.writeInt(this.value);
  }

  @Override
  public void readFields(final DataInput dataInput) throws IOException {
    this.value = dataInput.readInt();
  }
}

I want to pass an instance of it as a part of some other class instance.

The equals would not work, because it will not consider the inner variable of enumeration, plus all enum instances are fixed at compile time and could not be created elsewhere.

Does it mean I could not send enums over the wire in Hadoop or there's a solution?

like image 446
Artem Oboturov Avatar asked Oct 09 '12 11:10

Artem Oboturov


People also ask

What is writable interface in Hadoop?

Writable is a strong interface in Hadoop which while serializing the data, reduces the data size enormously, so that data can be exchanged easily within the networks. It has separate read and write fields to read data from network and write data into local disk respectively.

How can you implement custom writable?

Implementing Writable requires implementing two methods, readFields(DataInput in) and write(DataOutput out) . Writables that are used as keys in MapReduce jobs must also implement Comparable (or simply WritableComparable).

What is the writable interface explain in details?

Interface WritableA serializable object which implements a simple, efficient, serialization protocol, based on DataInput and DataOutput . Any key or value type in the Hadoop Map-Reduce framework implements this interface.

What is writable comparable and comparator in Hadoop?

A Writable which is also Comparable . WritableComparable s can be compared to each other, typically via Comparator s. Any type which is to be used as a key in the Hadoop Map-Reduce framework should implement this interface. Note that hashCode() is frequently used in Hadoop to partition keys.


1 Answers

My normal and preferred solution for enums in Hadoop is serializing the enums through their ordinal value.

public class EnumWritable implements Writable {

    static enum EnumName {
        ENUM_1, ENUM_2, ENUM_3
    }

    private int enumOrdinal;

    // never forget your default constructor in Hadoop Writables
    public EnumWritable() {
    }

    public EnumWritable(Enum<?> arbitraryEnum) {
        this.enumOrdinal = arbitraryEnum.ordinal();
    }

    public int getEnumOrdinal() {
        return enumOrdinal;
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        enumOrdinal = in.readInt();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(enumOrdinal);
    }

    public static void main(String[] args) {
        // use it like this:
        EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
        // let Hadoop do the write and read stuff
        EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
    }

}

Obviously it has drawbacks: Ordinals can change, so if you exchange ENUM_2 with ENUM_3 and read a previously serialized file, this will return the other wrong enum.

So if you know the enum class beforehand, you can write the name of your enum and use it like this:

 enumInstance = EnumName.valueOf(in.readUTF());

This will use slightly more space, but it is more save to changes to your enum names.

The full example would look like this:

public class EnumWritable implements Writable {

    static enum EnumName {
        ENUM_1, ENUM_2, ENUM_3
    }

    private EnumName enumInstance;

    // never forget your default constructor in Hadoop Writables
    public EnumWritable() {
    }

    public EnumWritable(EnumName e) {
        this.enumInstance = e;
    }

    public EnumName getEnum() {
        return enumInstance;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(enumInstance.name());
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        enumInstance = EnumName.valueOf(in.readUTF());
    }

    public static void main(String[] args) {
        // use it like this:
        EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
        // let Hadoop do the write and read stuff
        EnumName yourDeserializedEnum = enumWritable.getEnum();

    }

}
like image 108
Thomas Jungblut Avatar answered Sep 21 '22 21:09

Thomas Jungblut