Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Validating input when mutating a dataclass

In Python 3.7 there are these new "dataclass" containers that are basically like mutable namedtuples. Suppose I make a dataclass that is meant to represent a person. I can add input validation via the __post_init__() function like this:

@dataclass
class Person:
    name: str
    age: float

    def __post_init__(self):
        if type(self.name) is not str:
            raise TypeError("Field 'name' must be of type 'str'.")
        self.age = float(self.age)
        if self.age < 0:
            raise ValueError("Field 'age' cannot be negative.")

This will let good inputs through:

someone = Person(name="John Doe", age=30)
print(someone)

Person(name='John Doe', age=30.0)

While all of these bad inputs will throw an error:

someone = Person(name=["John Doe"], age=30)
someone = Person(name="John Doe", age="thirty")
someone = Person(name="John Doe", age=-30)

However, since dataclasses are mutable, I can do this:

someone = Person(name="John Doe", age=30)
someone.age = -30
print(someone)

Person(name='John Doe', age=-30)

Thus bypassing the input validation.

So, what is the best way to make sure that the fields of a dataclass aren't mutated to something bad, after initialization?

like image 825
dain Avatar asked Feb 02 '19 00:02

dain


1 Answers

Dataclasses are a mechanism to provide a default initialization to accept the attributes as parameters, and a nice representation, plus some niceties like the __post_init__ hook.

Fortunatelly, they do not mess with any other mechanism for attribute access in Python - and you can still have your dataclassess attributes being created as property descriptors, or a custom descriptor class if you want. In that way, any attribute access will go through your getter and setter functions automatically.

The only drawback for using the default property built-in is that you have to use it in the "old way", and not with the decorator syntax - that allows you to create annotations for your attributes.

So, "descriptors" are special objects assigned to class attributes in Python in a way that any access to that attribute will call the descriptors __get__, __set__ or __del__ methods. The property built-in is a convenince to build a descriptor passed 1 to 3 functions taht will be called from those methods.

So, with no custom descriptor-thing, you could do:

@dataclass
class MyClass:
   def setname(self, value):
       if not isinstance(value, str):
           raise TypeError(...)
       self.__dict__["name"] = value
   def getname(self):
       return self.__dict__.get("name")
   name: str = property(getname, setname)
   # optionally, you can delete the getter and setter from the class body:
   del setname, getname

By using this approach you will have to write each attribute's access as two methods/functions, but will no longer need to write your __post_init__: each attribute will validate itself.

Also note that this example took the little usual approach of storing the attributes normally in the instance's __dict__. In the examples around the web, the practice is to use normal attribute access, but prepending the name with a _. This will leave these attributes polluting a dir on your final instance, and the private attributes will be unguarded.

Another approach is to write your own descriptor class, and let it check the instance and other properties of the attributes you want to guard. This can be as sofisticated as you want, culminating with your own framework. So for a descriptor class that will check for attribute type and accept a validator-list, you will need:

def positive_validator(name, value):
    if value <= 0:
        raise ValueError(f"values for {name!r}  have to be positive")

class MyAttr:
     def __init__(self, type, validators=()):
          self.type = type
          self.validators = validators

     def __set_name__(self, owner, name):
          self.name = name

     def __get__(self, instance, owner):
          if not instance: return self
          return instance.__dict__[self.name]

     def __delete__(self, instance):
          del instance.__dict__[self.name]

     def __set__(self, instance, value):
          if not isinstance(value, self.type):
                raise TypeError(f"{self.name!r} values must be of type {self.type!r}")
          for validator in self.validators:
               validator(self.name, value)
          instance.__dict__[self.name] = value

#And now

@dataclass
class Person:
    name: str = MyAttr(str)
    age: float = MyAttr((int, float), [positive_validator,])

That is it - creating your own descriptor class requires a bit more knowledge about Python, but the code given above should be good for use, even in production - you are welcome to use it.

Note that you could easily add a lot of other checks and transforms for each of your attributes - and the code in __set_name__ itself could be changed to introspect the __annotations__ in the owner class to automatically take note of the types - so that the type parameter would not be needed for the MyAttr class itself. But as I said before: you can make this as sophisticated as you want.

like image 137
jsbueno Avatar answered Sep 20 '22 14:09

jsbueno