Validating detailed types in python dataclasses

Tags:

Python 3.7 was released a while ago, and I wanted to test some of the fancy new dataclass+typing features. Getting hints to work right is easy enough, with both native types and those from the typing module:

>>> import dataclasses >>> import typing as ty >>>  ... @dataclasses.dataclass ... class Structure: ...     a_str: str ...     a_str_list: ty.List[str] ... >>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't']) >>> my_struct.a_str_list[0].  # IDE suggests all the string methods :)

But one other thing that I wanted to try was forcing the type hints as conditions during runtime, i.e. it should not be possible for a dataclass with incorrect types to exist. It can be implemented nicely with __post_init__:

>>> @dataclasses.dataclass ... class Structure: ...     a_str: str ...     a_str_list: ty.List[str] ...      ...     def validate(self): ...         ret = True ...         for field_name, field_def in self.__dataclass_fields__.items(): ...             actual_type = type(getattr(self, field_name)) ...             if actual_type != field_def.type: ...                 print(f"\t{field_name}: '{actual_type}' instead of '{field_def.type}'") ...                 ret = False ...         return ret ...      ...     def __post_init__(self): ...         if not self.validate(): ...             raise ValueError('Wrong types')

This kind of validate function works for native types and custom classes, but not those specified by the typing module:

>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't']) Traceback (most recent call last):   a_str_list: '<class 'list'>' instead of 'typing.List[str]'   ValueError: Wrong types

Is there a better approach to validate an untyped list with a typing-typed one? Preferably one that doesn't include checking the types of all elements in any list, dict, tuple, or set that is a dataclass' attribute.

Revisiting this question after a couple of years, I've now moved to use pydantic in cases where I want to validate classes that I'd normally just define a dataclass for. I'll leave my mark with the currently accepted answer though, since it correctly answers the original question and has outstanding educational value.

689

asked May 28 '18 09:05

Arne

2 Answers

Instead of checking for type equality, you should use isinstance. But you cannot use a parametrized generic type (typing.List[int]) to do so, you must use the "generic" version (typing.List). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__ attribute that you can use for that.

Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__ attribute. Compare:

# Python 3.6 >>> import typing >>> typing.List.__origin__ >>> typing.List[int].__origin__ typing.List

and

# Python 3.7 >>> import typing >>> typing.List.__origin__ <class 'list'> >>> typing.List[int].__origin__ <class 'list'>

Python 3.8 introduce even better support with the typing.get_origin() introspection function:

# Python 3.8 >>> import typing >>> typing.get_origin(typing.List) <class 'list'> >>> typing.get_origin(typing.List[int]) <class 'list'>

Notable exceptions being typing.Any, typing.Union and typing.ClassVar… Well, anything that is a typing._SpecialForm does not define __origin__. Fortunately:

>>> isinstance(typing.Union, typing._SpecialForm) True >>> isinstance(typing.Union[int, str], typing._SpecialForm) False >>> typing.get_origin(typing.Union[int, str]) typing.Union

But parametrized types define an __args__ attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args() function to retrieve them:

# Python 3.7 >>> typing.Union[int, str].__args__ (<class 'int'>, <class 'str'>)  # Python 3.8 >>> typing.get_args(typing.Union[int, str]) (<class 'int'>, <class 'str'>)

So we can improve type checking a bit:

for field_name, field_def in self.__dataclass_fields__.items():     if isinstance(field_def.type, typing._SpecialForm):         # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)         continue     try:         actual_type = field_def.type.__origin__     except AttributeError:         # In case of non-typing types (such as <class 'int'>, for instance)         actual_type = field_def.type     # In Python 3.8 one would replace the try/except with     # actual_type = typing.get_origin(field_def.type) or field_def.type     if isinstance(actual_type, typing._SpecialForm):         # case of typing.Union[…] or typing.ClassVar[…]         actual_type = field_def.type.__args__      actual_value = getattr(self, field_name)     if not isinstance(actual_value, actual_type):         print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'")         ret = False

This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]] or typing.Optional[typing.List[int]] for instance, but it should get things started.

Next is the way to apply this check.

Instead of using __post_init__, I would go the decorator route: this could be used on anything with type hints, not only dataclasses:

import inspect import typing from contextlib import suppress from functools import wraps   def enforce_types(callable):     spec = inspect.getfullargspec(callable)      def check_types(*args, **kwargs):         parameters = dict(zip(spec.args, args))         parameters.update(kwargs)         for name, value in parameters.items():             with suppress(KeyError):  # Assume un-annotated parameters can be any type                 type_hint = spec.annotations[name]                 if isinstance(type_hint, typing._SpecialForm):                     # No check for typing.Any, typing.Union, typing.ClassVar (without parameters)                     continue                 try:                     actual_type = type_hint.__origin__                 except AttributeError:                     # In case of non-typing types (such as <class 'int'>, for instance)                     actual_type = type_hint                 # In Python 3.8 one would replace the try/except with                 # actual_type = typing.get_origin(type_hint) or type_hint                 if isinstance(actual_type, typing._SpecialForm):                     # case of typing.Union[…] or typing.ClassVar[…]                     actual_type = type_hint.__args__                  if not isinstance(value, actual_type):                     raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value)))      def decorate(func):         @wraps(func)         def wrapper(*args, **kwargs):             check_types(*args, **kwargs)             return func(*args, **kwargs)         return wrapper      if inspect.isclass(callable):         callable.__init__ = decorate(callable.__init__)         return callable      return decorate(callable)

Usage being:

@enforce_types @dataclasses.dataclass class Point:     x: float     y: float  @enforce_types def foo(bar: typing.Union[int, str]):     pass

Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:

type hints using strings (class Foo: def __init__(self: 'Foo'): pass) are not taken into account by inspect.getfullargspec: you may want to use typing.get_type_hints and inspect.signature instead;
a default value which is not the appropriate type is not validated:
```
 @enforce_type  def foo(bar: int = None):      pass   foo() 
```
does not raise any TypeError. You may want to use inspect.Signature.bind in conjuction with inspect.BoundArguments.apply_defaults if you want to account for that (and thus forcing you to define def foo(bar: typing.Optional[int] = None));
variable number of arguments can't be validated as you would have to define something like def foo(*args: typing.Sequence, **kwargs: typing.Mapping) and, as said at the beginning, we can only validate containers and not contained objects.

Update

After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing module and will propose a few findings and a new approach here.

For starter, typing is doing a great job in finding when an argument is optional:

>>> def foo(a: int, b: str, c: typing.List[str] = None): ...   pass ...  >>> typing.get_type_hints(foo) {'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}

This is pretty neat and definitely an improvement over inspect.getfullargspec, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints will bail out for other kind of default values:

>>> def foo(a: int, b: str, c: typing.List[str] = 3): ...   pass ...  >>> typing.get_type_hints(foo) {'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}

So you may still need extra strict checking, even though such cases feels very fishy.

Next is the case of typing hints used as arguments for typing._SpecialForm, such as typing.Optional[typing.List[str]] or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]. Since the __args__ of these typing._SpecialForms is always a tuple, it is possible to recursively find the __origin__ of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm left.

Proposed improvements:

import inspect import typing from functools import wraps   def _find_type_origin(type_hint):     if isinstance(type_hint, typing._SpecialForm):         # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal,         # typing.NoReturn, typing.Optional, or typing.Union without parameters         return      actual_type = typing.get_origin(type_hint) or type_hint  # requires Python 3.8     if isinstance(actual_type, typing._SpecialForm):         # case of typing.Union[…] or typing.ClassVar[…] or …         for origins in map(_find_type_origin, typing.get_args(type_hint)):             yield from origins     else:         yield actual_type   def _check_types(parameters, hints):     for name, value in parameters.items():         type_hint = hints.get(name, typing.Any)         actual_types = tuple(_find_type_origin(type_hint))         if actual_types and not isinstance(value, actual_types):             raise TypeError(                     f"Expected type '{type_hint}' for argument '{name}'"                     f" but received type '{type(value)}' instead"             )   def enforce_types(callable):     def decorate(func):         hints = typing.get_type_hints(func)         signature = inspect.signature(func)          @wraps(func)         def wrapper(*args, **kwargs):             parameters = dict(zip(signature.parameters, args))             parameters.update(kwargs)             _check_types(parameters, hints)              return func(*args, **kwargs)         return wrapper      if inspect.isclass(callable):         callable.__init__ = decorate(callable.__init__)         return callable      return decorate(callable)   def enforce_strict_types(callable):     def decorate(func):         hints = typing.get_type_hints(func)         signature = inspect.signature(func)          @wraps(func)         def wrapper(*args, **kwargs):             bound = signature.bind(*args, **kwargs)             bound.apply_defaults()             parameters = dict(zip(signature.parameters, bound.args))             parameters.update(bound.kwargs)             _check_types(parameters, hints)              return func(*args, **kwargs)         return wrapper      if inspect.isclass(callable):         callable.__init__ = decorate(callable.__init__)         return callable      return decorate(callable)

_{Thanks to @Aran-Fey that helped me improve this answer.}

127

answered Oct 11 '22 09:10

301_Moved_Permanently

Just found this question.

pydantic can do full type validation for dataclasses out of the box. (admission: I built pydantic)

Just use pydantic's version of the decorator, the resulting dataclass is completely vanilla.

from datetime import datetime from pydantic.dataclasses import dataclass  @dataclass class User:     id: int     name: str = 'John Doe'     signup_ts: datetime = None  print(User(id=42, signup_ts='2032-06-21T12:00')) """ User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0)) """  User(id='not int', signup_ts='2032-06-21T12:00')

The last line will give:

    ... pydantic.error_wrappers.ValidationError: 1 validation error id   value is not a valid integer (type=type_error.integer)

answered Oct 11 '22 07:10

SColvin

Related questions
                            
                                python how to "negate" value : if true return false, if false return true
                            
                                Pandas: join DataFrames on field with different names?
                            
                                Python: Dictionary merge by updating but not overwriting if value exists
                            
                                python replace single backslash with double backslash
                            
                                Organizing Python classes in modules and/or packages
                            
                                Summing the contents of two collections.Counter() objects [duplicate]
                            
                                pandas : update value if condition in 3 columns are met
                            
                                python sorting dictionary by length of values
                            
                                The right way to limit maximum number of threads running at once?
                            
                                Passing csrftoken with python Requests
                            
                                Python Metaclass : Understanding the 'with_metaclass()'
                            
                                How do I compare a Unicode string that has different bytes, but the same value?
                            
                                How to move pandas data from index to column after multiple groupby
                            
                                How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?
                            
                                Extracting first n columns of a numpy matrix
                            
                                Python class static methods
                            
                                How can I import urlparse in python-3? [duplicate]
                            
                                Python virtualenv questions
                            
                                Retain all entries except for one key python
                            
                                Is Django for the frontend or backend? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Validating detailed types in python dataclasses

Tags:

python

type-hinting

python-typing

python-dataclasses