Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Decoding protobuf without schema

Is it possible to decode protobuf serialized files without schema with tools or anything that would decode the binary data to readable format?

like image 619
Ahmed Saleh Avatar asked Sep 17 '14 18:09

Ahmed Saleh


People also ask

How do I send a Protobuf message in Python?

1 Install the Protobuf package on your receiving host. 2 Have the same Protobuf schema ( .proto file) as the host creating the message. 3 Generate the Python meta classes out of the Protobuf schema. 4 Decode the Protobuf message using the generated meta classes with Python.

Can we modify an existing Protobuf after using it?

There may be a case that, we would modify an existing protobuf after using it for sometime. In that case, we would need to take care of few things, as the changes needs to be backward compatible. If we want to add a new field, then the field must use a new tag. We must not update the tag of an existing field.

What are the disadvantages of the Protobuf message?

contains only values; the mapping of data to key is done using tags and length; As such, the Protbuf message loose its sense out of the context of the Protobuf schema, which establishes the relations between the tag and the key in the processing application. Hence, the major drawback of the Protobuf message is that it is not the self-descriptive.

What is generatedprotocolmessagetype in Protobuf?

Google protobuf provides a metaclass ‘ GeneratedProtocolMessageType ’ which is responsible for creating classes from protocol message descriptors at runtime. It also injects the field descriptors into the classes output. And this allows us to use the person_pb2 as a module consisting the Person class and its attributes.


1 Answers

You can often deduce the schema. In fact, IIRC the "protoc" tool has a set of parameters (--decode_raw, iirc) where it will do precisely that - making informed guesses. However, it is a guess - the format is ambiguous in that multiple different types of data can be stored in the same mechanisms - for example, a length-prefixed chunk could be:

  • a sub-object (of any user type)
  • a packed array (of various primitive types)
  • a utf-8 string
  • a raw byte[]
  • and probably something else I'm forgetting

Likewise, a 4-byte fixed-width chunk could be a fixed-width integer, or a float; the integer could be signed or unsigned.

like image 183
Marc Gravell Avatar answered Sep 21 '22 07:09

Marc Gravell