Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Working with Protocol Buffers and internal data models

I have an existing internal data model for a Picture, as follows:

package test.model;
public class Picture {

  private int height, width;
  private Format format;

  public enum Format {
    JPEG, BMP, GIF
  }

  // Constructor, getters and setters, hashCode, equals, toString etc.
}

I now want to serialize it using protocol buffers. I've written a Picture.proto file that mirrors the fields of the Picture class and compiled the code under the test.model.protobuf package with a classname of PictureProtoBuf:

package test.model.protobuf;

option java_package = "test.model.protobuf";
option java_outer_classname = "PictureProtoBuf";

message Picture {
  enum Format {
    JPEG = 1;
    BMP = 2;
    GIF = 3;
  }
  required uint32 width = 1;
  required uint32 height = 2;
  required Format format = 3;
}

Now I am now assuming that if I have a Picture that I want to serialize and send somewhere I have to create a PictureProtoBuf object and map all the fields across, like so:

Picture p = new Picture(100, 200, Picture.JPEG);
PictureProtoBuf.Picture.Builder output = PictureProtoBuf.Picture.newBuilder();
output.setHeight(p.getHeight());
output.setWidth(p.getWidth());

I'm coming unstuck when I have an enumeration in my data model. The ugly way that I'm using right now is:

output.setFormat(PictureProtoBuf.Picture.Format.valueOf(p.getFormat().name());

However, this is prone to breakage and relies on the enumeration name being consistent between my internal data model and the protocol buffer data model (which isn't a great assumption as enumeration names within .proto files need to be unique). I can see me having to hand-craft switch statements on enumerations if the .name() call from the internal model doesn't match the protobuf-generated enumeration name.

I guess my question is whether I'm going about this the right way? Am I supposed to scrap my internal data model (test.model.Picture) in favour of the protobuf-generated one (test.model.protobuf.PictureProtoBuf)? If so, how can I implement some of the niceties that I have done in my internal data model (e.g. hashCode(), equals(Object), toString(), etc.)?

like image 746
Catchwa Avatar asked Feb 14 '12 06:02

Catchwa


People also ask

What is protocol buffers used for?

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.

What is protocol buffer in Tensorflow?

What are protocol buffers? Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

What is protocol buffers gRPC?

Protocol Buffer, a.k.a. Protobuf Protobuf is the most commonly used IDL (Interface Definition Language) for gRPC. It's where you basically store your data and function contracts in the form of a proto file.

Why is it called protocol buffer?

Why the name "Protocol Buffers"? The name originates from the early days of the format, before we had the protocol buffer compiler to generate classes for us. At the time, there was a class called ProtocolBuffer which actually acted as a buffer for an individual method.


3 Answers

Although the existing answers are good, I decided to go a bit further with Marc Gravell's suggestion to look into protostuff.

You can use the protostuff runtime module along with the dynamic ObjectSchema to create schemas at runtime for your internal data model

My code now reduces to:

// Do this once
private static Schema<Picture> schema = RuntimeSchema.getSchema(Picture.class);
private static final LinkedBuffer buffer = LinkedBuffer.allocate(DEFAULT_BUFFER_SIZE);

// For each Picture you want to serialize...
Picture p = new Picture(100, 200, Picture.JPEG);
byte[] result = ProtobufIOUtil.toByteArray(p, schema, buffer);
buffer.clear();
return result;

This is a great improvement over the Google protobuf library (see my question) when you have lots and lots of attributes in your internal data model. There is also no speed penalty that I can detect (with my use cases, anyway!)

like image 184
Catchwa Avatar answered Oct 19 '22 11:10

Catchwa


If you have control over your internal data model, you could modify test.model.Picture so that the enum values know their corresponding protobuf equivalent, probably passing in the correspondence to your enum constructors.

For example, using Guava's BiMap (bidirectional map with unique values), we get something like

enum ProtoEnum { // we don't control this
  ENUM1, ENUM2, ENUM3;
}

enum MyEnum {
  ONE(ProtoEnum.ENUM1), TWO(ProtoEnum.ENUM2), THREE(ProtoEnum.ENUM3);

  static final ImmutableBiMap<MyEnum, ProtoEnum> CORRESPONDENCE;

  static {
    ImmutableBiMap.Builder<ProtoEnum, MyEnum> builder = ImmutableBiMap.builder();
    for (MyEnum x : MyEnum.values()) {
      builder.put(x.corresponding, x);
    }
    CORRESPONDENCE = builder.build();
  }

  private final ProtoEnum corresponding;

  private MyEnum(ProtoEnum corresponding) {
    this.corresponding = corresponding;
  }
}

and then if we want to look up the MyEnum corresponding to a ProtoEnum, we just do MyEnum.CORRESPONDENCE.get(protoEnum), and to go the other way, we just do MyEnum.CORRESPONDENCE.inverse().get(myEnum) or myEnum.getCorresponding().

like image 39
Louis Wasserman Avatar answered Oct 19 '22 10:10

Louis Wasserman


One way is to only keep the generated enum:

package test.model;
public class Picture {

  private int height, width;
  private PictureProtoBuf.Picture.Format format;

 // Constructor, getters and setters, hashCode, equals, toString etc.
}

I've used this a few times, it may or may not make sense in your case. Using the protobuf generated classes as you data model (or extending them to add functionality), is never recommended, though.

like image 23
Dmitri Avatar answered Oct 19 '22 10:10

Dmitri