Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I JSON serialize an object from google's natural language API? (No __dict__ attribute)

I'm using the Google Natural Language API for a project tagging text with sentiment analysis. I want to store my NL results as JSON. If a direct HTTP request is made to Google then a JSON response is returned.

However when using the provided Python libraries an object is returned instead, and that object is not directly JSON serializable.

Here is a sample of my code:

import os
import sys
import oauth2client.client
from google.cloud.gapic.language.v1beta2 import enums, language_service_client
from google.cloud.proto.language.v1beta2 import language_service_pb2

class LanguageReader:
    # class that parses, stores and reports language data from text

    def __init__(self, content=None):

        try:
            # attempts to autheticate credentials from env variable
            oauth2client.client.GoogleCredentials.get_application_default()
        except oauth2client.client.ApplicationDefaultCredentialsError:
            print("=== ERROR: Google credentials could not be authenticated! ===")
            print("Current enviroment variable for this process is: {}".format(os.environ['GOOGLE_APPLICATION_CREDENTIALS']))
            print("Run:")
            print("   $ export GOOGLE_APPLICATION_CREDENTIALS=/YOUR_PATH_HERE/YOUR_JSON_KEY_HERE.json")
            print("to set the authentication credentials manually")
            sys.exit()

        self.language_client = language_service_client.LanguageServiceClient()
        self.document = language_service_pb2.Document()
        self.document.type = enums.Document.Type.PLAIN_TEXT
        self.encoding = enums.EncodingType.UTF32

        self.results = None

        if content is not None:
                self.read_content(content)

    def read_content(self, content):
        self.document.content = content
        self.language_client.analyze_sentiment(self.document, self.encoding)
        self.results = self.language_client.analyze_sentiment(self.document, self.encoding)

Now if you were to run:

sample_text="I love R&B music. Marvin Gaye is the best. 'What's Going On' is one of my favorite songs. It was so sad when Marvin Gaye died."
resp = LanguageReader(sample_text).results
print resp

You would get:

document_sentiment {
  magnitude: 2.40000009537
  score: 0.40000000596
}
language: "en"
sentences {
  text {
    content: "I love R&B music."
  }
  sentiment {
    magnitude: 0.800000011921
    score: 0.800000011921
  }
}
sentences {
  text {
    content: "Marvin Gaye is the best."
    begin_offset: 18
  }
  sentiment {
    magnitude: 0.800000011921
    score: 0.800000011921
  }
}
sentences {
  text {
    content: "\'What\'s Going On\' is one of my favorite songs."
    begin_offset: 43
  }
  sentiment {
    magnitude: 0.40000000596
    score: 0.40000000596
  }
}
sentences {
  text {
    content: "It was so sad when Marvin Gaye died."
    begin_offset: 90
  }
  sentiment {
    magnitude: 0.20000000298
    score: -0.20000000298
  }
}

Which is not JSON. It's an instance of the google.cloud.proto.language.v1beta2.language_service_pb2.AnalyzeSentimentResponse object. And it has no __dict__ attribute attribute so it is not serializable by using json.dumps().

How can I either specify that the response should be in JSON or serialize the object to JSON?

like image 292
Zach Kagan Avatar asked Aug 14 '17 22:08

Zach Kagan


People also ask

How do I make an object JSON serializable?

Use toJSON() Method to make class JSON serializable So we don't need to write custom JSONEncoder. This new toJSON() serializer method will return the JSON representation of the Object. i.e., It will convert custom Python Object to JSON string.

How do you serialize an object to JSON in Python?

The json module exposes two methods for serializing Python objects into JSON format. dump() will write Python data to a file-like object. We use this when we want to serialize our Python data to an external JSON file. dumps() will write Python data to a string in JSON format.

What is JSON serialization?

An object that converts between JSON and the equivalent Foundation objects.


1 Answers

Edit: @Zach noted Google's protobuf Data Interchange Format. It seems the preferred option would be to use these protobuf.json_format methods:

from google.protobuf.json_format import MessageToDict, MessageToJson 

self.dict = MessageToDict(self.results)
self.json = MessageToJson(self.results)

From the docstring:

MessageToJson(message, including_default_value_fields=False, preserving_proto_field_name=False)
    Converts protobuf message to JSON format.

    Args:
      message: The protocol buffers message instance to serialize.
      including_default_value_fields: If True, singular primitive fields,
          repeated fields, and map fields will always be serialized.  If
          False, only serialize non-empty fields.  Singular message fields
          and oneof fields are not affected by this option.
      preserving_proto_field_name: If True, use the original proto field
          names as defined in the .proto file. If False, convert the field
          names to lowerCamelCase.

    Returns:
      A string containing the JSON formatted protocol buffer message.
like image 198
brennan Avatar answered Nov 07 '22 05:11

brennan