Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extended dict-like subclass to support casting and JSON dumping without extras

I need to create an instance t of a dict-like class T that supports both "casting" to a real dict with dict(**t), without reverting to doing dict([(k, v) for k, v in t.items()]). As well as supports dumping as JSON using the standard json library, without extending the normal JSON Encoder (i.e. no function provided for the default parameter).

With t being a normal dict, both work:

import json

def dump(data):
    print(list(data.items()))
    try:
        print('cast:', dict(**data))
    except Exception as e:
        print('ERROR:', e)
    try:
        print('json:', json.dumps(data))
    except Exception as e:
        print('ERROR:', e)

t = dict(a=1, b=2)
dump(t)

printing:

[('a', 1), ('b', 2)]
cast: {'a': 1, 'b': 2}
json: {"a": 1, "b": 2}

However I want t to be an instance of the class T that adds e.g. a key default "on the fly" to its items, so no inserting up-front is possible (actually I want merged keys from one or more instances of T to show up, this is a simplification of that real, much more complex, class).

class T(dict):
    def __getitem__(self, key):
        if key == 'default':
           return 'DEFAULT'
        return dict.__getitem__(self, key)

    def items(self):
        for k in dict.keys(self):
            yield k, self[k]
        yield 'default', self['default']

    def keys(self):
        for k in dict.keys(self):
            yield k 
        yield 'default'

t = T(a=1, b=2)
dump(t)

this gives:

[('a', 1), ('b', 2), ('default', 'DEFAULT')]
cast: {'a': 1, 'b': 2}
json: {"a": 1, "b": 2, "default": "DEFAULT"}

and the cast doesn't work properly because there is no key 'default', and I don't know which "magic" function to provide to make casting work.

When I build T upon the functionality that collections.abc implements, and provide the required abstract methods in the subclass, casting works:

from collections.abc import MutableMapping

class TIter:
    def __init__(self, t):
        self.keys = list(t.d.keys()) + ['default']
        self.index = 0

    def __next__(self):
        if self.index == len(self.keys):
            raise StopIteration
        res = self.keys[self.index]
        self.index += 1
        return res

class T(MutableMapping):
    def __init__(self, **kw):
        self.d = dict(**kw)

    def __delitem__(self, key):
        if key != 'default':
            del self.d[key]

    def __len__(self):
        return len(self.d) + 1

    def __setitem__(self, key, v):
        if key != 'default':
            self.d[key] = v

    def __getitem__(self, key):
        if key == 'default':
           return 'DEFAULT'
        # return None
        return self.d[key]

    def __iter__(self):
        return TIter(self)

t = T(a=1, b=2)
dump(t)

which gives:

[('a', 1), ('b', 2), ('default', 'DEFAULT')]
cast: {'a': 1, 'b': 2, 'default': 'DEFAULT'}
ERROR: Object of type 'T' is not JSON serializable

The JSON dumping fails because that dumper cannot handle MutableMapping subclasses, it explicitly tests on the C level using PyDict_Check.

When I tried to make T a subclass of both dict and MutableMapping, I did get the same result as when using only the dict subclass.

I can of course consider it a bug that the json dumper has not been updated to assume that (concrete subclasses of) collections.abc.Mapping are dumpable. But even if it is acknowledged as a bug and gets fixed in some future version of Python, I don't think such a fix will be applied to older versions of Python.

Q1: How can I make the T implementation that is a subclass of dict, to cast properly?
Q2: If Q1 doesn't have an answer, would it work if I make a C level class that returns the right value for PyDict_Check but doesn't do any of the actual implementation (and then make T a subclass of that as well as MutableMapping (I don't think adding such an incomplete C level dict will work, but I haven't tried), and would this fool json.dumps()?
Q3 Is this a complete wrong approach to get both to work like the first example?


The actual code, that is much more complex, is a part of my ruamel.yaml library which has to work on Python 2.7 and Python 3.4+.

As long as I can't solve this, I have to tell people that used to have functioning JSON dumpers (without extra arguments) to use:

def json_default(obj):
    if isinstance(obj, ruamel.yaml.comments.CommentedMap):
        return obj._od
    if isinstance(obj, ruamel.yaml.comments.CommentedSeq):
        return obj._lst
    raise TypeError

print(json.dumps(d, default=json_default))

, tell them to use a different loader than the default (round-trip) loader. E.g.:

yaml = YAML(typ='safe')
data = yaml.load(stream)

, implements some .to_json() method on the class T and make users of ruamel.yaml aware of that

, or go back to subclassing dict and have tell people to do

 dict([(k, v) for k, v in t.items()])

none of which is really friendly and would indicate it is impossible to make a dict-like class that is non-trivial and cooperates well with the standard library.

like image 991
Anthon Avatar asked Sep 13 '18 12:09

Anthon


People also ask

What is JSON dumps and JSON loads?

json loads -> returns an object from a string representing a json object. json dumps -> returns a string representing a json object from an object. load and dump -> read/write from/to file instead of string.

What is the difference between JSON dump and JSON dumps?

The json. dump() method (without “s” in “dump”) used to write Python serialized object as JSON formatted data into a file. The json. dumps() method encodes any Python object into JSON formatted String.

What is JSON dumps in Python?

The json. dumps() method allows us to convert a python object into an equivalent JSON object. Or in other words to send the data from python to json. The json. dump() method allows us to convert a python object into an equivalent JSON object and store the result into a JSON file at the working directory.

What is JSON loads in Python?

loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary. It is mainly used for deserializing native string, byte, or byte array which consists of JSON data into Python Dictionary.


1 Answers

Since the real problem here is really json.dumps's default encoder's inability to consider MutableMapping (or ruamel.yaml.comments.CommentedMap in your real-world example) as a dict, instead of telling people to set the default parameter of json.dumps to your json_default function like you mentioned, you can use functools.partial to make json_default a default value for the default parameter of json.dumps so that people don't have to do anything differently when they use your package:

from functools import partial
json.dumps = partial(json.dumps, default=json_default)

Or if you need to allow people to specify their own default parameter or even their own json.JSONEncoder subclass, you can use a wrapper around json.dumps so that it wraps the default function specified by the default parameter and the default method of the custom encoder specified by the cls parameter, whichever one is specified:

import inspect

class override_json_default:
    # keep track of the default methods that have already been wrapped
    # so we don't wrap them again
    _wrapped_defaults = set()

    def __call__(self, func):
        def override_default(default_func):
            def default_wrapper(o):
                o = default_func(o)
                if isinstance(o, MutableMapping):
                    o = dict(o)
                return o
            return default_wrapper

        def override_default_method(default_func):
            def default_wrapper(self, o):
                try:
                    return default_func(self, o)
                except TypeError:
                    if isinstance(o, MutableMapping):
                        return dict(o)
                    raise
            return default_wrapper

        def wrapper(*args, **kwargs):
            bound = sig.bind(*args, **kwargs)
            bound.apply_defaults()
            default = bound.arguments.get('default')
            if default:
                bound.arguments['default'] = override_default(default)
            encoder = bound.arguments.get('cls')
            if not default and not encoder:
                bound.arguments['cls'] = encoder = json.JSONEncoder
            if encoder:
                default = getattr(encoder, 'default')
                if default not in self._wrapped_defaults:
                    default = override_default_method(default)
                    self._wrapped_defaults.add(default)
                setattr(encoder, 'default', default)
            return func(*bound.args, **bound.kwargs)

        sig = inspect.signature(func)
        return wrapper

json.dumps=override_json_default()(json.dumps)

so that the following test code with both a custom default function and a custom encoder that handle datetime objects, as well as one without a custom default or encoder:

from datetime import datetime

def datetime_encoder(o):
    if isinstance(o, datetime):
        return o.isoformat()
    return o

class DateTimeEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, datetime):
            return o.isoformat()
        return super(DateTimeEncoder, self).default(o)

def dump(data):
    print(list(data.items()))
    try:
        print('cast:', dict(**data))
    except Exception as e:
        print('ERROR:', e)
    try:
        print('json with custom default:', json.dumps(data, default=datetime_encoder))
        print('json wtih custom encoder:', json.dumps(data, cls=DateTimeEncoder))
        del data['c']
        print('json without datetime:', json.dumps(data))
    except Exception as e:
        print('ERROR:', e)

t = T(a=1, b=2, c=datetime.now())
dump(t)

would all give the proper output:

[('a', 1), ('b', 2), ('c', datetime.datetime(2018, 9, 15, 23, 59, 25, 575642)), ('default', 'DEFAULT')]
cast: {'a': 1, 'b': 2, 'c': datetime.datetime(2018, 9, 15, 23, 59, 25, 575642), 'default': 'DEFAULT'}
json with custom default: {"a": 1, "b": 2, "c": "2018-09-15T23:59:25.575642", "default": "DEFAULT"}
json wtih custom encoder: {"a": 1, "b": 2, "c": "2018-09-15T23:59:25.575642", "default": "DEFAULT"}
json without datetime: {"a": 1, "b": 2, "default": "DEFAULT"}

As pointed out in the comments, the above code uses inspect.signature, which is not available until Python 3.3, and even then, inspect.BoundArguments.apply_defaults is not available until Python 3.5, and the funcsigs package, a backport of Python 3.3's inspect.signature, does not have the apply_defaults method either. To make the code as backward-compatible as possible, you can simply copy and paste the code of Python 3.5+'s inspect.BoundArguments.apply_defaults to your module and assign it as an attribute of inspect.BoundArguments after importing funcsigs as necessary:

from collections import OrderedDict

if not hasattr(inspect, 'signature'):
    import funcsigs
    for attr in funcsigs.__all__:
        setattr(inspect, attr, getattr(funcsigs, attr))

if not hasattr(inspect.BoundArguments, 'apply_defaults'):
    def apply_defaults(self):
        arguments = self.arguments
        new_arguments = []
        for name, param in self._signature.parameters.items():
            try:
                new_arguments.append((name, arguments[name]))
            except KeyError:
                if param.default is not funcsigs._empty:
                    val = param.default
                elif param.kind is funcsigs._VAR_POSITIONAL:
                    val = ()
                elif param.kind is funcsigs._VAR_KEYWORD:
                    val = {}
                else:
                    continue
                new_arguments.append((name, val))
        self.arguments = OrderedDict(new_arguments)

    inspect.BoundArguments.apply_defaults = apply_defaults
like image 133
blhsing Avatar answered Sep 19 '22 21:09

blhsing