Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to model HashMap/Dictionary in the ProtoBuf efficiently

I have a protobuf file serialized by .NET code and I would like to consume it into Java. In the .NET code, there is Dictionary data type and the proto schema looks like

message Pair {
   optional string key = 1;
   optional string value = 2;
}

message Dictionary {
   repeated Pair pairs = 1;
}

Just as described in stackoverflow post Dictionary in protocol buffers.

I can use protoc to compile the proto file into Java classes fine. I can deserialize the protobuf file into Java objects successfully. The only problem is that it translates to a List of Pair objects in Java instead of HashMap. Of course, I still have all the data, but I cannot access the data as efficiently as I prefer. If I have the value of the key, I have to loop through the whole list to get its corresponding value. This does not seem to be optimal.

I am wondering if there is a better way to model Dictionary/Map data type in the protobuf.

Thanks

Update:

I tried Jon Skeet's suggestion to add map type field in the addressbook example and still ran into issue.

message Person {
  required string name = 1;
  required int32 id = 2;        // Unique ID number for this person.
  optional string email = 3;
  enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
  }
  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = HOME];
  }
  repeated PhoneNumber phone = 4;
  map<string, string> mapdata = 5;
}

The protoc throws error when compiling

addressbook.proto:25:3: Expected "required", "optional", or "repeated".
addressbook.proto:25:6: Expected field name.

According to Google protobuf doc, proto 2 does support map type https://developers.google.com/protocol-buffers/docs/proto#maps . As I quote,

Maps cannot be repeated, optional, or required.

So I don't really know why protoc cannot compile it. There is another discussion here have to create java pojo for the existing proto includes Map. The answer suggests that map is only a proto 3 feature. This contradicts google's documentation.

like image 470
Lan Avatar asked Aug 24 '15 16:08

Lan


People also ask

Does protobuf support map?

Map is one of the composite datatypes of Protobuf.

What is faster than protobuf?

TL;DR — encoding and decoding string-intensive data in JavaScript is faster with JSON than it is with protobuf. When you have structured data in JavaScript, which needs to be sent over the network (for another microservice for example) or saved into a storage system, it first needs to be serialized.

How big can a protobuf message be?

A string cannot exceed 2GB. As described, bytes can store custom data types, up to 2GB in size.

Is protobuf map ordered?

In general protobuf may serialize fields in a random order.


1 Answers

Well, maps are already supported in "protobuf proper" as of v3.0. For example, your proto is effectively:

message Dictionary {
    map<string, string> pairs = 1;
}

The good news is that with the key and value fields you've defined, that's fully backward-compatible with your existing data :)

The bad news is that I don't know whether or not protobuf-net supports it. If you're not actually using the .proto file on the .NET side, and doing everything declaratively, you may just be able to modify your .proto file, regenerate the Java code, and go...

The remaining bad news is that maps were introduced in v3.0 which is still in alpha/beta at the time of this writing. Now, depending on when you need to ship, you may decide to bet on v3.0 being released by the time you need it - the benefits of having nice map syntax are pretty significant, in my view. Most of the changes being made at the moment are around the new proto3 features - whereas maps are allowed within proto2 syntax files too... it's just that you need the v3.0 compiler and runtime to use them.

like image 106
Jon Skeet Avatar answered Oct 16 '22 09:10

Jon Skeet