How to make json.dumps in Python ignore a non-serializable field

Tags:

I am trying to serialize the output of parsing some binary data with the Construct2.9 library. I want to serialize the result to JSON.

packet is an instance of a Construct class Container.

Apparently it contains a hidden _io of type BytesIO - see output of dict(packet) below:

{
'packet_length': 76, 'uart_sent_time': 1, 'frame_number': 42958, 
'subframe_number': 0, 'checksum': 33157, '_io': <_io.BytesIO object at 0x7f81c3153728>, 
'platform':661058, 'sync': 506660481457717506, 'frame_margin': 20642,
'num_tlvs': 1, 'track_process_time': 593, 'chirp_margin': 78,
'timestamp': 2586231182, 'version': 16908293
}

Now, calling json.dumps(packet) obviously leads to a TypeError:

...

File "/usr/lib/python3.5/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_io.BytesIO object at 0x7f81c3153728> is not JSON serializable

However what I am confused about, is that running json.dumps(packet, skipkeys=True) results in the exact same error, while I would expect it to skip the _io field. What is the problem here? Why is skipkeys not allowing me to skip the _io field?

I got the code to work by overriding JSONEncoder and returning None for fields of BytesIO type, but that means my serialized string contains loads of "_io": null elements, which I would prefer not to have at all...

916

asked Aug 03 '18 13:08

mz8i

3 Answers

Keys with a leading _ underscore are not really 'hidden', they are just more strings to JSON. The Construct Container class is just a dictionary with ordering, the _io key is not anything special to that class.

You have two options:

implement a default hook that just returns a replacement value.
Filter out the key-value pairs that you know can't work before serialising.

and perhaps a third, but a casual scan of the Construct project pages doesn't tell me if it is available: have Construct output JSON or at least a JSON-compatible dictionary, perhaps by using adapters.

The default hook can't prevent the _io key from being added to the output, but would let you at least avoid the error:

json.dumps(packet, default=lambda o: '<not serializable>')

Filtering can be done recursively; the @functools.singledispatch() decorator can help keep such code clean:

from functools import singledispatch

_cant_serialize = object()

@singledispatch
def json_serializable(object, skip_underscore=False):
    """Filter a Python object to only include serializable object types

    In dictionaries, keys are converted to strings; if skip_underscore is true
    then keys starting with an underscore ("_") are skipped.

    """
    # default handler, called for anything without a specific
    # type registration.
    return _cant_serialize

@json_serializable.register(dict)
def _handle_dict(d, skip_underscore=False):
    converted = ((str(k), json_serializable(v, skip_underscore))
                 for k, v in d.items())
    if skip_underscore:
        converted = ((k, v) for k, v in converted if k[:1] != '_')
    return {k: v for k, v in converted if v is not _cant_serialize}

@json_serializable.register(list)
@json_serializable.register(tuple)
def _handle_sequence(seq, skip_underscore=False):
    converted = (json_serializable(v, skip_underscore) for v in seq)
    return [v for v in converted if v is not _cant_serialize]

@json_serializable.register(int)
@json_serializable.register(float)
@json_serializable.register(str)
@json_serializable.register(bool)  # redudant, supported as int subclass
@json_serializable.register(type(None))
def _handle_default_scalar_types(value, skip_underscore=False):
    return value

I have the above implementation an additional skip_underscore argument too, to explicitly skip keys that have a _ character at the start. This would help skip all additional 'hidden' attributes the Construct library is using.

Since Container is a dict subclass, the above code will automatically handle instances such as packet.

answered Nov 01 '22 12:11

Martijn Pieters

Ignoring a non-serializable field requires heavy extra logic as correctly pointed out in all previous answers.

If you don't really need to exclude the field, then you can generate a default value instead:

def safe_serialize(obj):
  default = lambda o: f"<<non-serializable: {type(o).__qualname__}>>"
  return json.dumps(obj, default=default)

obj = {"a": 1, "b": bytes()} # bytes is non-serializable by default
print(safe_serialize(obj))

That will produce this result:

{"a": 1, "b": "<<non-serializable: bytes>>"}

This code will print the type name, which might be useful if you want to implement your custom serializers later on.

answered Nov 01 '22 13:11

David Rissato Cruz

skipkeys doesn't do what you might think it does - it instructs the json.JSONEncoder to skip keys that are not of a basic type, not the values of the keys - i.e. if your had a dict {object(): "foobar"} it would skip the object() key, whereas without skipkeys set to True it would raise a TypeError.

You can overload JSONEncoder.iterencode() (and its underbelly) and perform look-ahead filtering there, but you'll end up pretty much rewriting the json module, slowing it down in the process as you won't be able to benefit from the compiled parts. What I'd suggest you is to pre-process your data via iterative filtering and skip keys/types you don't want in your final JSON. Then the json module should be able to process it without any additional instructions. Something like:

import collections

class SkipFilter(object):

    def __init__(self, types=None, keys=None, allow_empty=False):
        self.types = tuple(types or [])
        self.keys = set(keys or [])
        self.allow_empty = allow_empty  # if True include empty filtered structures

    def filter(self, data):
        if isinstance(data, collections.Mapping):
            result = {}  # dict-like, use dict as a base
            for k, v in data.items():
                if k in self.keys or isinstance(v, self.types):  # skip key/type
                    continue
                try:
                    result[k] = self.filter(v)
                except ValueError:
                    pass
            if result or self.allow_empty:
                return result
        elif isinstance(data, collections.Sequence):
            result = []  # a sequence, use list as a base
            for v in data:
                if isinstance(v, self.types):  # skip type
                    continue
                try:
                    result.append(self.filter(v))
                except ValueError:
                    pass
            if result or self.allow_empty:
                return result
        else:  # we don't know how to traverse this structure...
            return data  # return it as-is, hope for the best...
        raise ValueError

Then create your filter:

import io

preprocessor = SkipFilter([io.BytesIO], ["_io"])  # double-whammy skip of io.BytesIO

In this case skipping just by type should suffice, but in case the _io key holds some other undesirable data this guarantees it won't be in the final result. Anyway, you can then just filter the data prior to passing it to the JSONEncoder:

import json

json_data = json.dumps(preprocessor.filter(packet))  # no _io keys or io.BytesIO data...

Of course, if your structure contains some other exotic data or data that is represented in JSON differently based on its type, this approach might mess it up as it turns all mappings into dict and all sequences into list. However, for general usage this should be more than enough.

answered Nov 01 '22 12:11

zwer

Related questions
                            
                                Finding the indices of the rows where there are non-zero entries in a sparse csc_matrix
                            
                                How to use hidden input in an html form with Python+Jinja2
                            
                                const arguments in Python
                            
                                Python: Using shutil.move or os.rename to move folders
                            
                                How to Append Masked Arrays
                            
                                python 3.3 dict: how to convert struct PyDictKeysObject to python class?
                            
                                Vectorized Operations on a datetime column in pandas
                            
                                Parametric equation with numpy
                            
                                Get a specific object in Django Rest Framework
                            
                                How to remove confirm form resubmission with Python Bottle framework
                            
                                Python logging. Use formatter with logging.exception()
                            
                                Mako Template Variable Names
                            
                                Django global data for threads
                            
                                Passing string to Fortran DLL using ctypes and Python
                            
                                How sending and receiving works in Python sockets?
                            
                                Group by interval of datetime using pandas
                            
                                How do I import FileNotFoundError from Python 3?
                            
                                Python abstract class shall force derived classes to initialize variable in __init__
                            
                                AppRegistryNotReady: The translation infrastructure cannot be initialized
                            
                                How to fix selenium "DevToolsActivePort file doesn't exist" exception in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to make json.dumps in Python ignore a non-serializable field

Tags:

python

json

python-3.x

construct