I'm writing a library to access REST API. It returns json with user object. I convert it to dict, and then convert it to dataclass object. The problem is that not all fields are fixed. I want to add additional fields (which are not specified in my dataclass) dynamically. I can simply assign values to my object, but they don't appear in the object representation and dataclasses.asdict
function doesn't add them into resulting dict:
from dataclasses import asdict, dataclass
@dataclass
class X:
i: int
x = X(i=42)
x.s = 'text'
x
# X(i=42)
x.s
# 'text'
asdict(x)
# {'i': 42}
Dataclasses are pickled by name, as well as other classes. Pickling classes which can not be accessed by name is not supported.
Dataclasses are python classes, but are suited for storing data objects. This module provides a decorator and functions for automatically adding generated special methods such as __init__() and __repr__() to user-defined classes.
A data class is a list of data set allocation attributes and their values. You cannot assign a data class to an object; however, data class may be used for allocation of a scratch tape to be used to write objects.
You could use make_dataclass
to create X
on the fly:
X = make_dataclass('X', [('i', int), ('s', str)])
x = X(i=42, s='text')
asdict(x)
# {'i': 42, 's': 'text'}
Or as a derived class:
@dataclass
class X:
i: int
x = X(i=42)
x.__class__ = make_dataclass('Y', fields=[('s', str)], bases=(X,))
x.s = 'text'
asdict(x)
# {'i': 42, 's': 'text'}
Update (6/22): As it's now mid-2022, I thought I'd refresh my answer with a brand new approach I've been toying around with. I am pleased to announce a fast, modern library I have quite recently released, called dotwiz
.
The dotwiz
library can be installed with pip:
pip install dotwiz
This is a tiny helper library that I've created, which makes dict
objects safe to access by dot notation - such as a.b.c
instead of a['b']['c']
. From personal tests and benchmarks, it's actually a lot faster than something like make_dataclass
- more info on this below.
Additionally, one can also subclass from DotWiz
or DotWizPlus
, and this enables type hinting and auto-completion hints from an IDE such as PyCharm. Here is a simple example of that below:
from dataclasses import asdict, make_dataclass
from dotwiz import DotWiz
class MyTypedWiz(DotWiz):
# add attribute names and annotations for better type hinting!
i: int
s: str
dw = MyTypedWiz(i=42, s='text')
print(dw)
# ✫(i=42, s='text')
print(dw.to_dict())
# {'i': 42, 's': 'text'}
If you still prefer to use dataclasses to model your data, I've included my original answer below that is mostly unchanged from years past.
The follow results were timed on a Mac Pro with the M1 chip, Python 3.10.4, and with n=5000
iterations.
Creating or instantiating the object:
$ python -m timeit -n 5000 -s "from dotwiz import DotWiz" -c "DotWiz(i=42, s='text')"
5000 loops, best of 5: 425 nsec per loop
$ python -m timeit -n 5000 -s "from dataclasses import make_dataclass" -c "X = make_dataclass('X', [('i', int), ('s', str)]); X(i=42, s='text')"
5000 loops, best of 5: 97.8 usec per loop
These times are probably inflated, but in this particular case it looks like DotWiz
is about 250x faster than make_dataclass
. In practice, I would say it's about 100 times faster on average.
Accessing a key by dot notation:
$ python -m timeit -n 5000 -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "dw.s.lower()"
5000 loops, best of 5: 39.7 nsec per loop
$ python -m timeit -n 5000 -s "from dataclasses import make_dataclass" -s "X = make_dataclass('X', [('i', int), ('s', str)])" -s "x = X(i=42, s='text')" -c "x.s.lower()"
5000 loops, best of 5: 39.9 nsec per loop
The times to access an attribute or a key look to be mostly the same.
Serializing the object to JSON:
$ python -m timeit -n 5000 -s "import json" -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "json.dumps(dw)"
5000 loops, best of 5: 1.1 usec per loop
$ python -m timeit -n 5000 -s "import json" -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "json.dumps(dw.to_dict())"
5000 loops, best of 5: 1.46 usec per loop
$ python -m timeit -n 5000 -s "import json" -s "from dataclasses import asdict, make_dataclass" -s "X = make_dataclass('X', [('i', int), ('s', str)])" -s "x = X(i=42, s='text')" -c "json.dumps(asdict(x))"
5000 loops, best of 5: 2.87 usec per loop
So, it actually looks like it's about 2.5x faster to serialize a DotWiz
object, as compared to a dataclass
instance.
As mentioned, fields marked as optional should resolve the issue. If not, consider using properties in dataclasses
. Yep, regular properties should work well enough - though you'll have to declare field in __post_init__
, and that's slightly inconvenient.
If you want to set a default value for the property so accessing getter immediately after creating the object works fine, and if you also want to be able to set a default value via constructor, you can make use of a concept called field properties; a couple libraries like dataclass-wizard provide full support for that.
example usage:
from dataclasses import asdict, dataclass
from typing import Optional
from dataclass_wizard import property_wizard
@dataclass
class X(metaclass=property_wizard):
i: int
s: Optional[str] = None
@property
def _s(self):
"""Returns a title-cased value, i.e. `stRiNg` -> `String`"""
return self._s.title() if self._s else None
@_s.setter
def _s(self, s: str):
"""Reverses a string, i.e. `olleH` -> `Hello` """
self._s = s[::-1] if s else None
x = X(i=42)
x
# X(i=42, s=None)
assert x.s is None # True
x.s = '!emordnilap'
x
# X(i=42, s='Palindrome!')
x.s
# 'Palindrome!'
asdict(x)
# {'i': 42, 's': 'Palindrome!'}
Disclaimer: I am the creator (and maintener) of this library.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With