Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically add fields to dataclass objects

I'm writing a library to access REST API. It returns json with user object. I convert it to dict, and then convert it to dataclass object. The problem is that not all fields are fixed. I want to add additional fields (which are not specified in my dataclass) dynamically. I can simply assign values to my object, but they don't appear in the object representation and dataclasses.asdict function doesn't add them into resulting dict:

from dataclasses import asdict, dataclass

@dataclass
class X:
    i: int

x = X(i=42)
x.s = 'text'

x
# X(i=42)

x.s
# 'text'

asdict(x)
# {'i': 42}
like image 301
rominf Avatar asked Sep 27 '18 10:09

rominf


People also ask

Can you pickle a Dataclass?

Dataclasses are pickled by name, as well as other classes. Pickling classes which can not be accessed by name is not supported.

What is@ dataclass in Python?

Dataclasses are python classes, but are suited for storing data objects. This module provides a decorator and functions for automatically adding generated special methods such as __init__() and __repr__() to user-defined classes.

What are data classes?

A data class is a list of data set allocation attributes and their values. You cannot assign a data class to an object; however, data class may be used for allocation of a scratch tape to be used to write objects.


2 Answers

You could use make_dataclass to create X on the fly:

X = make_dataclass('X', [('i', int), ('s', str)])
x = X(i=42, s='text')

asdict(x)
# {'i': 42, 's': 'text'}

Or as a derived class:

@dataclass
class X:
    i: int

x = X(i=42)
x.__class__ = make_dataclass('Y', fields=[('s', str)], bases=(X,))
x.s = 'text'

asdict(x)
# {'i': 42, 's': 'text'}
like image 173
w-m Avatar answered Oct 18 '22 07:10

w-m


Update (6/22): As it's now mid-2022, I thought I'd refresh my answer with a brand new approach I've been toying around with. I am pleased to announce a fast, modern library I have quite recently released, called dotwiz.

The dotwiz library can be installed with pip:

pip install dotwiz

This is a tiny helper library that I've created, which makes dict objects safe to access by dot notation - such as a.b.c instead of a['b']['c']. From personal tests and benchmarks, it's actually a lot faster than something like make_dataclass - more info on this below.

Additionally, one can also subclass from DotWiz or DotWizPlus, and this enables type hinting and auto-completion hints from an IDE such as PyCharm. Here is a simple example of that below:

from dataclasses import asdict, make_dataclass

from dotwiz import DotWiz


class MyTypedWiz(DotWiz):
    # add attribute names and annotations for better type hinting!
    i: int
    s: str


dw = MyTypedWiz(i=42, s='text')
print(dw)
# ✫(i=42, s='text')

print(dw.to_dict())
# {'i': 42, 's': 'text'}

If you still prefer to use dataclasses to model your data, I've included my original answer below that is mostly unchanged from years past.

Benchmarks

The follow results were timed on a Mac Pro with the M1 chip, Python 3.10.4, and with n=5000 iterations.

Creating or instantiating the object:

$ python -m timeit -n 5000 -s "from dotwiz import DotWiz" -c "DotWiz(i=42, s='text')"
5000 loops, best of 5: 425 nsec per loop

$ python -m timeit -n 5000 -s "from dataclasses import make_dataclass" -c "X = make_dataclass('X', [('i', int), ('s', str)]); X(i=42, s='text')"
5000 loops, best of 5: 97.8 usec per loop

These times are probably inflated, but in this particular case it looks like DotWiz is about 250x faster than make_dataclass. In practice, I would say it's about 100 times faster on average.

Accessing a key by dot notation:

$ python -m timeit -n 5000 -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "dw.s.lower()"         
5000 loops, best of 5: 39.7 nsec per loop

$ python -m timeit -n 5000 -s "from dataclasses import make_dataclass" -s "X = make_dataclass('X', [('i', int), ('s', str)])" -s "x = X(i=42, s='text')" -c "x.s.lower()"
5000 loops, best of 5: 39.9 nsec per loop

The times to access an attribute or a key look to be mostly the same.

Serializing the object to JSON:

$ python -m timeit -n 5000 -s "import json" -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "json.dumps(dw)"
5000 loops, best of 5: 1.1 usec per loop

$ python -m timeit -n 5000 -s "import json" -s "from dotwiz import DotWiz" -s "dw = DotWiz(i=42, s='text')" -c "json.dumps(dw.to_dict())"
5000 loops, best of 5: 1.46 usec per loop

$ python -m timeit -n 5000 -s "import json" -s "from dataclasses import asdict, make_dataclass" -s "X = make_dataclass('X', [('i', int), ('s', str)])" -s "x = X(i=42, s='text')" -c "json.dumps(asdict(x))"
5000 loops, best of 5: 2.87 usec per loop

So, it actually looks like it's about 2.5x faster to serialize a DotWiz object, as compared to a dataclass instance.

Original Answer

As mentioned, fields marked as optional should resolve the issue. If not, consider using properties in dataclasses. Yep, regular properties should work well enough - though you'll have to declare field in __post_init__, and that's slightly inconvenient.

If you want to set a default value for the property so accessing getter immediately after creating the object works fine, and if you also want to be able to set a default value via constructor, you can make use of a concept called field properties; a couple libraries like dataclass-wizard provide full support for that.

example usage:

from dataclasses import asdict, dataclass
from typing import Optional

from dataclass_wizard import property_wizard


@dataclass
class X(metaclass=property_wizard):
    i: int
    s: Optional[str] = None

    @property
    def _s(self):
        """Returns a title-cased value, i.e. `stRiNg` -> `String`"""
        return self._s.title() if self._s else None

    @_s.setter
    def _s(self, s: str):
        """Reverses a string, i.e. `olleH` -> `Hello` """
        self._s = s[::-1] if s else None


x = X(i=42)

x
# X(i=42, s=None)

assert x.s is None  # True

x.s = '!emordnilap'

x
# X(i=42, s='Palindrome!')

x.s
# 'Palindrome!'

asdict(x)
# {'i': 42, 's': 'Palindrome!'}

Disclaimer: I am the creator (and maintener) of this library.

like image 5
rv.kvetch Avatar answered Oct 18 '22 07:10

rv.kvetch