In Python 3.7 there are these new "dataclass" containers that are basically like mutable namedtuples. Suppose I make a dataclass that is meant to represent a person. I can add input validation via the __post_init__()
function like this:
@dataclass
class Person:
name: str
age: float
def __post_init__(self):
if type(self.name) is not str:
raise TypeError("Field 'name' must be of type 'str'.")
self.age = float(self.age)
if self.age < 0:
raise ValueError("Field 'age' cannot be negative.")
This will let good inputs through:
someone = Person(name="John Doe", age=30)
print(someone)
Person(name='John Doe', age=30.0)
While all of these bad inputs will throw an error:
someone = Person(name=["John Doe"], age=30)
someone = Person(name="John Doe", age="thirty")
someone = Person(name="John Doe", age=-30)
However, since dataclasses are mutable, I can do this:
someone = Person(name="John Doe", age=30)
someone.age = -30
print(someone)
Person(name='John Doe', age=-30)
Thus bypassing the input validation.
So, what is the best way to make sure that the fields of a dataclass aren't mutated to something bad, after initialization?
Dataclasses are a mechanism to provide a default initialization to accept the attributes as parameters, and a nice representation, plus some niceties like the __post_init__
hook.
Fortunatelly, they do not mess with any other mechanism for attribute access in Python - and you can still have your dataclassess attributes being created as property
descriptors, or a custom descriptor class if you want. In that way, any attribute access will go through your getter and setter functions automatically.
The only drawback for using the default property
built-in is that you have to use it in the "old way", and not with the decorator syntax - that allows you to create annotations for your attributes.
So, "descriptors" are special objects assigned to class attributes in Python in a way that any access to that attribute will call the descriptors __get__
, __set__
or __del__
methods. The property
built-in is a convenince to build a descriptor passed 1 to 3 functions taht will be called from those methods.
So, with no custom descriptor-thing, you could do:
@dataclass
class MyClass:
def setname(self, value):
if not isinstance(value, str):
raise TypeError(...)
self.__dict__["name"] = value
def getname(self):
return self.__dict__.get("name")
name: str = property(getname, setname)
# optionally, you can delete the getter and setter from the class body:
del setname, getname
By using this approach you will have to write each attribute's access as two methods/functions, but will no longer need to write your __post_init__
: each attribute will validate itself.
Also note that this example took the little usual approach of storing the attributes normally in the instance's __dict__
. In the examples around the web, the practice is to use normal attribute access, but prepending the name with a _
. This will leave these attributes polluting a dir
on your final instance, and the private attributes will be unguarded.
Another approach is to write your own descriptor class, and let it check the instance and other properties of the attributes you want to guard. This can be as sofisticated as you want, culminating with your own framework. So for a descriptor class that will check for attribute type and accept a validator-list, you will need:
def positive_validator(name, value):
if value <= 0:
raise ValueError(f"values for {name!r} have to be positive")
class MyAttr:
def __init__(self, type, validators=()):
self.type = type
self.validators = validators
def __set_name__(self, owner, name):
self.name = name
def __get__(self, instance, owner):
if not instance: return self
return instance.__dict__[self.name]
def __delete__(self, instance):
del instance.__dict__[self.name]
def __set__(self, instance, value):
if not isinstance(value, self.type):
raise TypeError(f"{self.name!r} values must be of type {self.type!r}")
for validator in self.validators:
validator(self.name, value)
instance.__dict__[self.name] = value
#And now
@dataclass
class Person:
name: str = MyAttr(str)
age: float = MyAttr((int, float), [positive_validator,])
That is it - creating your own descriptor class requires a bit more knowledge about Python, but the code given above should be good for use, even in production - you are welcome to use it.
Note that you could easily add a lot of other checks and transforms for each of your attributes -
and the code in __set_name__
itself could be changed to introspect the __annotations__
in the owner
class to automatically take note of the types - so that the type parameter would not be needed for the MyAttr
class itself. But as I said before: you can make this as sophisticated as you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With