Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Protocol buffers detect type from raw message

Is it possible to detect the type of a raw protocol buffer message (in byte[])

I have a situation where an endpoint can receive different messages and I need to be able to detect the type before I can deserialize it.

I am using protobuf-net

like image 488
Yavor Shahpasov Avatar asked Feb 02 '12 23:02

Yavor Shahpasov


People also ask

What are protocol buffers used for?

Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data.

What is a protocol buffer message?

Protocol buffers are a combination of the definition language (created in . proto files), the code that the proto compiler generates to interface with data, language-specific runtime libraries, and the serialization format for data that is written to a file (or sent across a network connection).

How does Protobuf serialize string?

The Protobuf serialization mechanism is given through the protoc application, this compiler will parse the . proto file and will generate as output, source files according to the configured language by its arguments, in this case, C++. You can also obtain more information about, reading the section compiler invocation.


1 Answers

You can't detect the type in isolation, since the protobuf spec doesn't add any data to the stream for this; however, there are a number of ways of making this easy, depending on the context:

  • a union type (as mentioned by Jon) covers a range of scenarios
  • inheritance (protobuf-net specific) can be versatile - you can have a base-message type, and any number of concrete message types
  • you can use a prefix to indicate the incoming type

the last approach is actually very valuable in the case of raw TCP streams; this is on the wire identical to the union type, but with a different implementation; by deciding in advance that 1=Foo, 2=Bar etc (exactly as you do for the union type approach), you can use SerializeWithLengthPrefix to write (specifying the 1/2/etc as the field number), and the non-generic TryDeserializeWithLengthPrefix to read (this is under Serializer.NonGeneric in the v1 API, or on the TypeModel in the v2 API), you can provide a type-map that resolves the numbers back to types, and hence deserialize the correct type. And to pre-empt the question "why is this useful with TCP streams?" - because: in an ongoing TCP stream you need to use the WithLengthPrefix methods anyway, to avoid over-reading the stream; so you might as well get the type identifier for free!

summary:

  • union type: easy to implement; only down side is having to then check which of the properties is non-null
  • inheritance: easy to implement; can use polymorphism or discriminator to handle "what now?"
  • type prefix: a bit more fiddly to implement, but allows more flexibility, and has zero overhead on TCP streams
like image 69
Marc Gravell Avatar answered Sep 21 '22 17:09

Marc Gravell