Data classes are the replacements of POJOs in Java. Hence, it is natural to think that they would allow for inheritance in Java and Kotlin. The inheritance of data classes in Kotlin doesn't execute well. Hence, it is advised not to use inheritance by extending the data class in Kotlin.
dataclass module is introduced in Python 3.7 as a utility tool to make structured classes specially for storing data. These classes hold certain properties and functions to deal specifically with the data and its representation. Although the module was introduced in Python3.
A dataclass can very well have regular instance and class methods. Dataclasses were introduced from Python version 3.7. For Python versions below 3.7, it has to be installed as a library.
Modifying fields after initialization with __post_init__ The __post_init__ method is called just after initialization. In other words, it is called after the object receives values for its fields, such as name , continent , population , and official_lang .
The way dataclasses combines attributes prevents you from being able to use attributes with defaults in a base class and then use attributes without a default (positional attributes) in a subclass.
That's because the attributes are combined by starting from the bottom of the MRO, and building up an ordered list of the attributes in first-seen order; overrides are kept in their original location. So Parent
starts out with ['name', 'age', 'ugly']
, where ugly
has a default, and then Child
adds ['school']
to the end of that list (with ugly
already in the list). This means you end up with ['name', 'age', 'ugly', 'school']
and because school
doesn't have a default, this results in an invalid argument listing for __init__
.
This is documented in PEP-557 Dataclasses, under inheritance:
When the Data Class is being created by the
@dataclass
decorator, it looks through all of the class's base classes in reverse MRO (that is, starting atobject
) and, for each Data Class that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes.
and under Specification:
TypeError
will be raised if a field without a default value follows a field with a default value. This is true either when this occurs in a single class, or as a result of class inheritance.
You do have a few options here to avoid this issue.
The first option is to use separate base classes to force fields with defaults into a later position in the MRO order. At all cost, avoid setting fields directly on classes that are to be used as base classes, such as Parent
.
The following class hierarchy works:
# base classes with fields; fields without defaults separate from fields with.
@dataclass
class _ParentBase:
name: str
age: int
@dataclass
class _ParentDefaultsBase:
ugly: bool = False
@dataclass
class _ChildBase(_ParentBase):
school: str
@dataclass
class _ChildDefaultsBase(_ParentDefaultsBase):
ugly: bool = True
# public classes, deriving from base-with, base-without field classes
# subclasses of public classes should put the public base class up front.
@dataclass
class Parent(_ParentDefaultsBase, _ParentBase):
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
@dataclass
class Child(Parent, _ChildDefaultsBase, _ChildBase):
pass
By pulling out fields into separate base classes with fields without defaults and fields with defaults, and a carefully selected inheritance order, you can produce an MRO that puts all fields without defaults before those with defaults. The reversed MRO (ignoring object
) for Child
is:
_ParentBase
_ChildBase
_ParentDefaultsBase
_ChildDefaultsBase
Parent
Note that Parent
doesn't set any new fields, so it doesn't matter here that it ends up 'last' in the field listing order. The classes with fields without defaults (_ParentBase
and _ChildBase
) precede the classes with fields with defaults (_ParentDefaultsBase
and _ChildDefaultsBase
).
The result is Parent
and Child
classes with a sane field older, while Child
is still a subclass of Parent
:
>>> from inspect import signature
>>> signature(Parent)
<Signature (name: str, age: int, ugly: bool = False) -> None>
>>> signature(Child)
<Signature (name: str, age: int, school: str, ugly: bool = True) -> None>
>>> issubclass(Child, Parent)
True
and so you can create instances of both classes:
>>> jack = Parent('jack snr', 32, ugly=True)
>>> jack_son = Child('jack jnr', 12, school='havard', ugly=True)
>>> jack
Parent(name='jack snr', age=32, ugly=True)
>>> jack_son
Child(name='jack jnr', age=12, school='havard', ugly=True)
Another option is to only use fields with defaults; you can still make in an error to not supply a school
value, by raising one in __post_init__
:
_no_default = object()
@dataclass
class Child(Parent):
school: str = _no_default
ugly: bool = True
def __post_init__(self):
if self.school is _no_default:
raise TypeError("__init__ missing 1 required argument: 'school'")
but this does alter the field order; school
ends up after ugly
:
<Signature (name: str, age: int, ugly: bool = True, school: str = <object object at 0x1101d1210>) -> None>
and a type hint checker will complain about _no_default
not being a string.
You can also use the attrs
project, which was the project that inspired dataclasses
. It uses a different inheritance merging strategy; it pulls overridden fields in a subclass to the end of the fields list, so ['name', 'age', 'ugly']
in the Parent
class becomes ['name', 'age', 'school', 'ugly']
in the Child
class; by overriding the field with a default, attrs
allows the override without needing to do a MRO dance.
attrs
supports defining fields without type hints, but lets stick to the supported type hinting mode by setting auto_attribs=True
:
import attr
@attr.s(auto_attribs=True)
class Parent:
name: str
age: int
ugly: bool = False
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f"The Name is {self.name} and {self.name} is {self.age} year old")
@attr.s(auto_attribs=True)
class Child(Parent):
school: str
ugly: bool = True
You can use attributes with defaults in parent classes if you exclude them from the init function. If you need the possibility to override the default at init, extend the code with the answer of Praveen Kulkarni.
from dataclasses import dataclass, field
@dataclass
class Parent:
name: str
age: int
ugly: bool = field(default=False, init=False)
@dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32)
jack_son = Child('jack jnr', 12, school = 'havard')
jack_son.ugly = True
Or even
@dataclass
class Child(Parent):
school: str
ugly = True
# This does not work
# ugly: bool = True
jack_son = Child('jack jnr', 12, school = 'havard')
assert jack_son.ugly
The approach below deals with this problem while using pure python dataclasses
and without much boilerplate code.
The ugly_init: dataclasses.InitVar[bool]
serves as a pseudo-field just to help us do initialization and will be lost once the instance is created. While ugly: bool = field(init=False)
is an instance member which will not be initialized by __init__
method but can be alternatively initialized using __post_init__
method (you can find more here.).
from dataclasses import dataclass, field
@dataclass
class Parent:
name: str
age: int
ugly: bool = field(init=False)
ugly_init: dataclasses.InitVar[bool]
def __post_init__(self, ugly_init: bool):
self.ugly = ugly_init
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
@dataclass
class Child(Parent):
school: str
jack = Parent('jack snr', 32, ugly_init=True)
jack_son = Child('jack jnr', 12, school='havard', ugly_init=True)
jack.print_id()
jack_son.print_id()
If you want to use a pattern where ugly_init
is optional, you can define a class method on the Parent that includes ugly_init
as an optional parameter:
from dataclasses import dataclass, field, InitVar
@dataclass
class Parent:
name: str
age: int
ugly: bool = field(init=False)
ugly_init: InitVar[bool]
def __post_init__(self, ugly_init: bool):
self.ugly = ugly_init
@classmethod
def create(cls, ugly_init=True, **kwargs):
return cls(ugly_init=ugly_init, **kwargs)
def print_name(self):
print(self.name)
def print_age(self):
print(self.age)
def print_id(self):
print(f'The Name is {self.name} and {self.name} is {self.age} year old')
@dataclass
class Child(Parent):
school: str
jack = Parent.create(name='jack snr', age=32, ugly_init=False)
jack_son = Child.create(name='jack jnr', age=12, school='harvard')
jack.print_id()
jack_son.print_id()
Now you can use the create
class method as a factory method for creating Parent/Child classes with a default value for ugly_init
. Note you must use named parameters for this approach to work.
You're seeing this error because an argument without a default value is being added after an argument with a default value. The insertion order of inherited fields into the dataclass is the reverse of Method Resolution Order, which means that the Parent
fields come first, even if they are over written later by their children.
An example from PEP-557 - Data Classes:
@dataclass class Base: x: Any = 15.0 y: int = 0 @dataclass class C(Base): z: int = 10 x: int = 15
The final list of fields is, in order,
x, y, z
. The final type ofx
isint
, as specified in classC
.
Unfortunately, I don't think there's any way around this. My understanding is that if the parent class has a default argument, then no child class can have non-default arguments.
based on Martijn Pieters solution I did the following:
1) Create a mixing implementing the post_init
from dataclasses import dataclass
no_default = object()
@dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is no_default:
raise TypeError(
f"__init__ missing 1 required argument: '{key}'"
)
2) Then in the classes with the inheritance problem:
from src.utils import no_default, NoDefaultAttributesChild
@dataclass
class MyDataclass(DataclassWithDefaults, NoDefaultAttributesPostInitMixin):
attr1: str = no_default
EDIT:
After a time I also find problems with this solution with mypy, the following code fix the issue.
from dataclasses import dataclass
from typing import TypeVar, Generic, Union
T = TypeVar("T")
class NoDefault(Generic[T]):
...
NoDefaultVar = Union[NoDefault[T], T]
no_default: NoDefault = NoDefault()
@dataclass
class NoDefaultAttributesPostInitMixin:
def __post_init__(self):
for key, value in self.__dict__.items():
if value is NoDefault:
raise TypeError(f"__init__ missing 1 required argument: '{key}'")
@dataclass
class Parent(NoDefaultAttributesPostInitMixin):
a: str = ""
@dataclass
class Child(Foo):
b: NoDefaultVar[str] = no_default
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With