Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding subclassing of JSONEncoder

I am trying to subclass json.JSONEncoder such that named tuples (defined using the new Python 3.6+ syntax, but it probably still applies to the output of collections.namedtuple) are serialised to JSON objects, where the tuple fields correspond to object keys.

For example:

from typing import NamedTuple

class MyModel(NamedTuple):
    foo:int
    bar:str = "Hello, World!"

a = MyModel(123)           # Expected JSON: {"foo": 123, "bar": "Hello, World!"}
b = MyModel(456, "xyzzy")  # Expected JSON: {"foo": 456, "bar": "xyzzy"}

My understanding is that I subclass json.JSONEncoder and override its default method to provide serialisations for new types. The rest of the class will then do the right thing with respect to recursion, etc. I thus came up with the following:

class MyJSONEncoder(json.JSONEncoder):
    def default(self, o):
        to_encode = None

        if isinstance(o, tuple) and hasattr(o, "_asdict"):
            # Dictionary representation of a named tuple
            to_encode = o._asdict()

        if isinstance(o, datetime):
            # String representation of a datetime
            to_encode = o.strftime("%Y-%m-%dT%H:%M:%S")

        # Why not super().default(to_encode or o)??
        return to_encode or o

This works when it tries to serialise (i.e., as the cls parameter to json.dumps) a datetime value -- to at least partially prove my hypothesis -- but the check for named tuples is never hit and it defaults to serialising it as a tuple (i.e., to a JSON array). Weirdly, I had presumed that I should call the superclass' default method on my transformed object, but this then raises an exception when it tries to serialise a datetime: "TypeError: Object of type 'str' is not JSON serializable", which frankly makes no sense!

I get the same behaviour if I make the named tuple type check more specific (e.g., isinstance(o, MyModel)). I did find, however, that I can almost get the behaviour I'm looking for if I also override the encode method, by moving the named tuple check to there:

class AlmostWorkingJSONEncoder(json.JSONEncoder):
    def default(self, o):
        to_encode = None

        if isinstance(o, datetime):
            # String representation of a datetime
            to_encode = o.strftime("%Y-%m-%dT%H:%M:%S")

        return to_encode or o

    def encode(self, o):
        to_encode = None

        if isinstance(o, tuple) and hasattr(o, "_asdict"):
            # Dictionary representation of a named tuple
            to_encode = o._asdict()

        # Here we *do* need to call the superclass' encode method??
        return super().encode(to_encode or o)

This works, but not recursively: It successfully serialises top-level named tuples into JSON objects, per my requirement, but any named tuples that exist within that named tuple will be serialised with the default behaviour (JSON array). This is also the behaviour if I put the named tuple type check in both the default and encode methods.

The documentation implies that only the default method should be changed in subclasses. I presume, for example, that overriding encode in AlmostWorkingJSONEncoder will cause it to break when it's doing chunked encoding. However, no amount of hackery has so far yielded what I want (or expect to happen, given the scant documentation).

Where is my misunderstanding?


EDIT Reading the code for json.JSONEncoder explains why the default method raises a type error when you pass it a string: It's not clear (at least to me) from the documentation, but the default method is meant to transform values of some unsupported type into a serialisable type, which is then returned; if the unsupported type is not transformed into anything in your overridden method, then you should call super().default(o) at the end to invoke a type error. So something like this:

class SubJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Foo):
            return SerialisableFoo(o)

        if isinstance(o, Bar):
            return SerialisableBar(o)

        # etc., etc.

        # No more serialisation options available, so raise a type error
        super().default(o)

I believe the problem I'm experiencing is that the default method is only called by the encoder when it can't match any supported types. A named tuple is still a tuple -- which is supported -- so it matches that first before delegating to my overridden default method. In Python 2.7, the functions that did this matching are part of the JSONEncoder object, but in Python 3, they seem to have been moved outside into the module namespace (and, thus, not accessible to userland). I thus believe it is not possible to subclass JSONEncoder to serialise named tuples in a generic way without doing a lot of rewriting and hard-coupling to your own implementation :(

EDIT 2 I submitted this as a bug.

like image 445
Xophmeister Avatar asked May 11 '17 10:05

Xophmeister


People also ask

What is JSONEncoder Python?

JSONEncoder is a class within the Flask project under the flask. json module. JSONEncoder is the default JSON encoder for Flask and was designed to handle more types than Python's standard library json module. jsonify is another callable from the flask. json package with code examples.

What is sort key parameter in dumps?

It is used to convert the array of JSON objects into a sorted JSON object. The value of the sort_keys argument of the dumps() function will require to set True to generate the sorted JSON objects from the array of JSON objects.

What is JSON dump?

The json. dumps() method allows us to convert a python object into an equivalent JSON object. Or in other words to send the data from python to json. The json. dump() method allows us to convert a python object into an equivalent JSON object and store the result into a JSON file at the working directory.


1 Answers

Bad News

Hmm, I just looked at the source and there doesn't appear to be a public hook to control how instances of list or tuple get serialized.

Worse News

An unsafe approach is to monkey patch the _make_iterencode() private function.

Good News

Another approach is to preprocess the input, converting the named tuples into dicts:

from json import JSONEncoder
from typing import NamedTuple
from datetime import datetime

def preprocess(tree):
    if isinstance(tree, dict):
        return {k: preprocess(v) for k, v in tree.items()}
    if isinstance(tree, tuple) and hasattr(tree, '_asdict'):
        return preprocess(tree._asdict())
    if isinstance(tree, (list, tuple)):
        return list(map(preprocess, tree))
    return tree

class MD(JSONEncoder):

    def default(self, o):
        if isinstance(o, datetime):
            return o.strftime("%Y-%m-%dT%H:%M:%S")
        return super().default(o)

Applied to these models:

class MyModel(NamedTuple):
    foo: int
    bar: str = "Hello, World!"

class LayeredModel(NamedTuple):
    baz: MyModel
    fob: list

a = MyModel(123)          
b = MyModel(456, "xyzzy")
c = LayeredModel(a, [a, b])
outer = dict(a=a, b=b, c=c, d=datetime.now(), e=10)
print(MD().encode(preprocess(outer)))

Gives this output:

{"a": {"foo": 123, "bar": "Hello, World!"},
 "b": {"foo": 456, "bar": "xyzzy"},
 "c": {"baz": {"foo": 123, "bar": "Hello, World!"},
       "fob": [{"foo": 123, "bar": "Hello, World!"},
               {"foo": 456, "bar": "xyzzy"}]},
 "d": "2019-11-03T10:46:17",
 "e": 10}
like image 97
Raymond Hettinger Avatar answered Oct 12 '22 10:10

Raymond Hettinger