Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to unpack nested JSON into Python Dataclass

Dataclass example:

@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int 
    statuses: List[StatusElement]

JSON example:

json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderindex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

I can unpack the JSON doing something like this:

object = List(**json)

But I'm not sure how can I also unpack the statuses into a status object and appened to the statuses list of the List object? I'm sure I need to loop over it somehow but not sure how to combine that with unpacking.

like image 211
Zach Johnson Avatar asked Dec 10 '25 16:12

Zach Johnson


2 Answers

Python dataclasses is a great module, but one of the things it doesn't unfortunately handle is parsing a JSON object to a nested dataclass structure.

A few workarounds exist for this:

  • You can either roll your own JSON parsing helper method, for example a from_json which converts a JSON string to an List instance with a nested dataclass.
  • You can make use of existing JSON serialization libraries. For example, pydantic is a popular one that supports this use case.

Here is an example using the dataclass-wizard library that works well enough for your use case. It's more lightweight than pydantic and coincidentally also a little faster. It also supports automatic case transforms and type conversions (for example str to annotated int)

Example below:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import JSONWizard


@dataclass
class List(JSONWizard):
    id: int
    statuses: PyList['StatusElement']
    # on Python 3.9+ you can use the following syntax:
    #   statuses: list['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}


object = List.from_dict(json)

print(repr(object))
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

Disclaimer: I am the creator (and maintainer) of this library.


You can now skip the class inheritance as of the latest release of dataclass-wizard. It's straightforward enough to use it; using the same example from above, but I've removed the JSONWizard usage from it completely. Just remember to ensure you don't import asdict from the dataclasses module, even though I guess that should coincidentally work.

Here's the modified version of the above without class inheritance:

from dataclasses import dataclass
from typing import List as PyList

from dataclass_wizard import fromdict, asdict


@dataclass
class List:
    id: int
    statuses: PyList['StatusElement']


@dataclass
class StatusElement:
    status: str
    order_index: int
    color: str
    type: str


json = {
  "id": "124",
  "statuses": [
    {
      "status": "to do",
      "orderIndex": 0,
      "color": "#d3d3d3",
      "type": "open"
    }]
}

# De-serialize the JSON dictionary into a `List` instance.
c = fromdict(List, json)

print(c)
# List(id=124, statuses=[StatusElement(status='to do', order_index=0, color='#d3d3d3', type='open')])

# Convert the instance back to a dictionary object that is JSON-serializable.
d = asdict(c)

print(d)
# {'id': 124, 'statuses': [{'status': 'to do', 'orderIndex': 0, 'color': '#d3d3d3', 'type': 'open'}]}

Also, here's a quick performance comparison with dacite. I wasn't aware of this library before, but it's also very easy to use (and there's also no need to inherit from any class). However, from my personal tests - Windows 10 Alienware PC using Python 3.9.1 - dataclass-wizard seemed to perform much better overall on the de-serialization process.

from dataclasses import dataclass
from timeit import timeit
from typing import List

from dacite import from_dict

from dataclass_wizard import JSONWizard, fromdict


data = {
    "id": 124,
    "statuses": [
        {
            "status": "to do",
            "orderindex": 0,
            "color": "#d3d3d3",
            "type": "open"
        }]
}


@dataclass
class StatusElement:
    status: str
    orderindex: int
    color: str
    type: str


@dataclass
class List:
    id: int
    statuses: List[StatusElement]


class ListWiz(List, JSONWizard):
    ...


n = 100_000

# 0.37
print('dataclass-wizard:            ', timeit('ListWiz.from_dict(data)', number=n, globals=globals()))

# 0.36
print('dataclass-wizard (fromdict): ', timeit('fromdict(List, data)', number=n, globals=globals()))

# 11.2
print('dacite:                      ', timeit('from_dict(List, data)', number=n, globals=globals()))


lst_wiz1 = ListWiz.from_dict(data)
lst_wiz2 = from_dict(List, data)
lst = from_dict(List, data)

# True
assert lst.__dict__ == lst_wiz1.__dict__ == lst_wiz2.__dict__
like image 58
rv.kvetch Avatar answered Dec 13 '25 06:12

rv.kvetch


I've spent a few hours investigating options for this. There's no native Python functionality to do this, but there are a few third-party packages (writing in November 2022):

  • marshmallow_dataclass has this functionality (you need not be using marshmallow in any other capacity in your project). It gives good error messages and the package is actively maintained. I used this for a while before hitting what I believe is a bug parsing a large and complex JSON into deeply nested dataclasses, and then had to switch away.
  • dataclass-wizard is easy to use and specifically addresses this use case. It has excellent documentation. One significant disadvantage is that it won't automatically attempt to find the right fit for a given JSON, if trying to match against a union of dataclasses (see https://dataclass-wizard.readthedocs.io/en/latest/common_use_cases/dataclasses_in_union_types.html). Instead it asks you to add a "tag key" to the input JSON, which is a robust solution but may not be possible if you have no control over the input JSON.
  • dataclass-json is similar to dataclass-wizard, and again doesn't attempt to match the correct dataclass within a union.
  • dacite is the option I have settled upon for the time being. It has similar functionality to marshmallow_dataclass, at least for JSON parsing. The error messages are significantly less clear than marshmallow_dataclass, but slightly offsetting this, it's easier to figure out what's wrong if you pdb in at the point that the error occurs - the internals are quite clear and you can experiment to see what's going wrong. According to others it is rather slow, but that's not a problem in my circumstance.
like image 32
Chris J Harris Avatar answered Dec 13 '25 06:12

Chris J Harris



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!