Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write protobuf objects to JSON file

I have such old.JSON file:

[{
    "id": "333333",
    "creation_timestamp": 0,
    "type": "MEDICAL",
    "owner": "MED.com",
    "datafiles": ["stomach.data", "heart.data"]
}]

Then I create an object based on .proto file:

message Dataset {
  string id = 1;
  uint64 creation_timestamp = 2;
  string type = 3;
  string owner = 4;
  repeated string datafiles = 6;
}

Now I want to save this object save back this object to other .JSON file. I did this:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    json.dump(MessageToJson(item), jsfile)

As a result I have:

"{\n  \"id\": \"333333\",\n  \"type\": \"MEDICAL\",\n  \"owner\": \"MED.com\",\n  \"datafiles\": [\n    \"stomach.data\",\n    \"heart.data\"\n  ]\n}"

How to make this file looks like old.JSON file?

like image 405
Kenenbek Arzymatov Avatar asked May 07 '17 17:05

Kenenbek Arzymatov


People also ask

Can you convert Protobuf to JSON?

Nested Class SummaryA Printer converts protobuf message to JSON format. A TypeRegistry is used to resolve Any messages in the JSON conversion.

Is Protobuf more efficient than JSON?

JSON is usually easier to debug (the serialized format is human-readable) and easier to work with (no need to define message types, compile them, install additional libraries, etc.). Protobuf, on the other hand, usually compresses data better and has built-in protocol documentation via the schema.

Should I use Protobuf or JSON?

As JSON is textual, its integers and floats can be slow to encode and decode. JSON is not designed for numbers. Also, Comparing strings in JSON can be slow. Protobuf is easier to bind to objects and faster.

Is Protobuf like JSON?

Protocol buffers provide a language-neutral, platform-neutral, extensible mechanism for serializing structured data in a forward-compatible and backward-compatible way. It's like JSON, except it's smaller and faster, and it generates native language bindings.


1 Answers

The weird escaping comes from converting the text to json twice, thus forcing the second call to escape the json characters from the first call. Detailed explanation follows:

https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.json_format-pysrc

31  """Contains routines for printing protocol messages in JSON format. 
32   
33  Simple usage example: 
34   
35    # Create a proto object and serialize it to a json format string. 
36    message = my_proto_pb2.MyMessage(foo='bar') 
37    json_string = json_format.MessageToJson(message) 
38   
39    # Parse a json format string to proto object. 
40    message = json_format.Parse(json_string, my_proto_pb2.MyMessage()) 
41  """ 

also

 89 -def MessageToJson(message, including_default_value_fields=False): 
...
 99    Returns: 
100      A string containing the JSON formatted protocol buffer message. 

It is pretty clear that this function will return exactly one object of type string. This string contains a lot of json structure, but it's still just a string, as far as python is concerned.

You then pass it to a function which takes a python object (not json), and serializes it to json.

https://docs.python.org/3/library/json.html

json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)

Serialize obj as a JSON formatted stream to fp (a .write()-supporting file-like object) using this conversion table.

Okay, how exactly would you encode a string into json? Clearly it can't just use json specific characters, so those would have to be escaped. Maybe there's an online tool, like http://bernhardhaeussner.de/odd/json-escape/ or http://www.freeformatter.com/json-escape.html

You can go there, post the starting json from the top of your question, tell it to generate the proper json, and you get back ... almost exactly what you are getting at the bottom of your question. Cool everything worked correctly!

(I say almost because one of those links adds some newlines on its own, for no apparent reason. If you encode it with the first link, then decode it with the second, it is exact.)

But that's not the answer you wanted, because you didn't want to double-jsonify the data structure. You just wanted to serialize it to json once, and write that to a file:

import json
from google.protobuf.json_format import MessageToJson

with open("new.json", 'w') as jsfile:
    actual_json_text = MessageToJson(item)
    jsfile.write( actual_json_text )

Addendum: MessageToJson might need additional parameters to behave as expected
including_default_value_fields=True
preserving_proto_field_name=True
(see comments and links below)

like image 123
Kenny Ostrom Avatar answered Sep 27 '22 00:09

Kenny Ostrom