Understanding subclassing of JSONEncoder

Tags:

I am trying to subclass json.JSONEncoder such that named tuples (defined using the new Python 3.6+ syntax, but it probably still applies to the output of collections.namedtuple) are serialised to JSON objects, where the tuple fields correspond to object keys.

For example:

from typing import NamedTuple

class MyModel(NamedTuple):
    foo:int
    bar:str = "Hello, World!"

a = MyModel(123)           # Expected JSON: {"foo": 123, "bar": "Hello, World!"}
b = MyModel(456, "xyzzy")  # Expected JSON: {"foo": 456, "bar": "xyzzy"}

My understanding is that I subclass json.JSONEncoder and override its default method to provide serialisations for new types. The rest of the class will then do the right thing with respect to recursion, etc. I thus came up with the following:

class MyJSONEncoder(json.JSONEncoder):
    def default(self, o):
        to_encode = None

        if isinstance(o, tuple) and hasattr(o, "_asdict"):
            # Dictionary representation of a named tuple
            to_encode = o._asdict()

        if isinstance(o, datetime):
            # String representation of a datetime
            to_encode = o.strftime("%Y-%m-%dT%H:%M:%S")

        # Why not super().default(to_encode or o)??
        return to_encode or o

This works when it tries to serialise (i.e., as the cls parameter to json.dumps) a datetime value -- to at least partially prove my hypothesis -- but the check for named tuples is never hit and it defaults to serialising it as a tuple (i.e., to a JSON array). Weirdly, I had presumed that I should call the superclass' default method on my transformed object, but this then raises an exception when it tries to serialise a datetime: "TypeError: Object of type 'str' is not JSON serializable", which frankly makes no sense!

I get the same behaviour if I make the named tuple type check more specific (e.g., isinstance(o, MyModel)). I did find, however, that I can almost get the behaviour I'm looking for if I also override the encode method, by moving the named tuple check to there:

class AlmostWorkingJSONEncoder(json.JSONEncoder):
    def default(self, o):
        to_encode = None

        if isinstance(o, datetime):
            # String representation of a datetime
            to_encode = o.strftime("%Y-%m-%dT%H:%M:%S")

        return to_encode or o

    def encode(self, o):
        to_encode = None

        if isinstance(o, tuple) and hasattr(o, "_asdict"):
            # Dictionary representation of a named tuple
            to_encode = o._asdict()

        # Here we *do* need to call the superclass' encode method??
        return super().encode(to_encode or o)

This works, but not recursively: It successfully serialises top-level named tuples into JSON objects, per my requirement, but any named tuples that exist within that named tuple will be serialised with the default behaviour (JSON array). This is also the behaviour if I put the named tuple type check in both the default and encode methods.

The documentation implies that only the default method should be changed in subclasses. I presume, for example, that overriding encode in AlmostWorkingJSONEncoder will cause it to break when it's doing chunked encoding. However, no amount of hackery has so far yielded what I want (or expect to happen, given the scant documentation).

Where is my misunderstanding?

EDIT Reading the code for json.JSONEncoder explains why the default method raises a type error when you pass it a string: It's not clear (at least to me) from the documentation, but the default method is meant to transform values of some unsupported type into a serialisable type, which is then returned; if the unsupported type is not transformed into anything in your overridden method, then you should call super().default(o) at the end to invoke a type error. So something like this:

class SubJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, Foo):
            return SerialisableFoo(o)

        if isinstance(o, Bar):
            return SerialisableBar(o)

        # etc., etc.

        # No more serialisation options available, so raise a type error
        super().default(o)

I believe the problem I'm experiencing is that the default method is only called by the encoder when it can't match any supported types. A named tuple is still a tuple -- which is supported -- so it matches that first before delegating to my overridden default method. In Python 2.7, the functions that did this matching are part of the JSONEncoder object, but in Python 3, they seem to have been moved outside into the module namespace (and, thus, not accessible to userland). I thus believe it is not possible to subclass JSONEncoder to serialise named tuples in a generic way without doing a lot of rewriting and hard-coupling to your own implementation :(

EDIT 2 I submitted this as a bug.

445

asked May 11 '17 10:05

Xophmeister

1 Answers

Bad News

Hmm, I just looked at the source and there doesn't appear to be a public hook to control how instances of list or tuple get serialized.

Worse News

An unsafe approach is to monkey patch the _make_iterencode() private function.

Good News

Another approach is to preprocess the input, converting the named tuples into dicts:

from json import JSONEncoder
from typing import NamedTuple
from datetime import datetime

def preprocess(tree):
    if isinstance(tree, dict):
        return {k: preprocess(v) for k, v in tree.items()}
    if isinstance(tree, tuple) and hasattr(tree, '_asdict'):
        return preprocess(tree._asdict())
    if isinstance(tree, (list, tuple)):
        return list(map(preprocess, tree))
    return tree

class MD(JSONEncoder):

    def default(self, o):
        if isinstance(o, datetime):
            return o.strftime("%Y-%m-%dT%H:%M:%S")
        return super().default(o)

Applied to these models:

class MyModel(NamedTuple):
    foo: int
    bar: str = "Hello, World!"

class LayeredModel(NamedTuple):
    baz: MyModel
    fob: list

a = MyModel(123)          
b = MyModel(456, "xyzzy")
c = LayeredModel(a, [a, b])
outer = dict(a=a, b=b, c=c, d=datetime.now(), e=10)
print(MD().encode(preprocess(outer)))

Gives this output:

{"a": {"foo": 123, "bar": "Hello, World!"},
 "b": {"foo": 456, "bar": "xyzzy"},
 "c": {"baz": {"foo": 123, "bar": "Hello, World!"},
       "fob": [{"foo": 123, "bar": "Hello, World!"},
               {"foo": 456, "bar": "xyzzy"}]},
 "d": "2019-11-03T10:46:17",
 "e": 10}

answered Oct 12 '22 10:10

Raymond Hettinger

Related questions
                            
                                Accessing samba shares with gio in python
                            
                                django combine models.DecimalField with forms -> error: quantize result has too many digits for current context
                            
                                Can you separate python projects logically into separate files/classes like in C#/Java?
                            
                                Error setting up Mercurial on Windows Server 2008
                            
                                Python to C# Code Explanation
                            
                                Implementation of an async method in Python DBus
                            
                                python-tz am I wrong or it's a bug
                            
                                Suggested GA operators for a TSP problem?
                            
                                How to Handle EOFError for raw_input() in python in Mac OS X
                            
                                Django Piston: How can I exclude nested fields from handler results? Is it even possible?
                            
                                Are Python global variables thread-safe?
                            
                                How do I get started with zc.buildout and Distribute?
                            
                                Find maximum signed short integer in python
                            
                                Feedparser-basics how to
                            
                                Scraping for a "preview" of a webpage - Python
                            
                                Installing PIL on Snow Leopard
                            
                                Cross platform hidden file detection
                            
                                unless statement in Python
                            
                                What's the best way to download file using urllib3
                            
                                Optimal tab size for code readability [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding subclassing of JSONEncoder

Tags:

python

json

namedtuple

python-3.6

subclass