Python 3.7 was released a while ago, and I wanted to test some of the fancy new dataclass
+typing features. Getting hints to work right is easy enough, with both native types and those from the typing
module:
>>> import dataclasses >>> import typing as ty >>> ... @dataclasses.dataclass ... class Structure: ... a_str: str ... a_str_list: ty.List[str] ... >>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't']) >>> my_struct.a_str_list[0]. # IDE suggests all the string methods :)
But one other thing that I wanted to try was forcing the type hints as conditions during runtime, i.e. it should not be possible for a dataclass
with incorrect types to exist. It can be implemented nicely with __post_init__
:
>>> @dataclasses.dataclass ... class Structure: ... a_str: str ... a_str_list: ty.List[str] ... ... def validate(self): ... ret = True ... for field_name, field_def in self.__dataclass_fields__.items(): ... actual_type = type(getattr(self, field_name)) ... if actual_type != field_def.type: ... print(f"\t{field_name}: '{actual_type}' instead of '{field_def.type}'") ... ret = False ... return ret ... ... def __post_init__(self): ... if not self.validate(): ... raise ValueError('Wrong types')
This kind of validate
function works for native types and custom classes, but not those specified by the typing
module:
>>> my_struct = Structure(a_str='test', a_str_list=['t', 'e', 's', 't']) Traceback (most recent call last): a_str_list: '<class 'list'>' instead of 'typing.List[str]' ValueError: Wrong types
Is there a better approach to validate an untyped list with a typing
-typed one? Preferably one that doesn't include checking the types of all elements in any list
, dict
, tuple
, or set
that is a dataclass
' attribute.
Revisiting this question after a couple of years, I've now moved to use pydantic
in cases where I want to validate classes that I'd normally just define a dataclass for. I'll leave my mark with the currently accepted answer though, since it correctly answers the original question and has outstanding educational value.
The validation can be done in two different ways, that is by using a flag variable or by using try or except which the flag variable will be set to false initially and if we can find out that the input data is what we are expecting the flag status can be set to true and find out what can be done next based on the ...
The post-init function is an in-built function in python and helps us to initialize a variable outside the __init__ function. post-init function in python.
Instead of checking for type equality, you should use isinstance
. But you cannot use a parametrized generic type (typing.List[int]
) to do so, you must use the "generic" version (typing.List
). So you will be able to check for the container type but not the contained types. Parametrized generic types define an __origin__
attribute that you can use for that.
Contrary to Python 3.6, in Python 3.7 most type hints have a useful __origin__
attribute. Compare:
# Python 3.6 >>> import typing >>> typing.List.__origin__ >>> typing.List[int].__origin__ typing.List
and
# Python 3.7 >>> import typing >>> typing.List.__origin__ <class 'list'> >>> typing.List[int].__origin__ <class 'list'>
Python 3.8 introduce even better support with the typing.get_origin()
introspection function:
# Python 3.8 >>> import typing >>> typing.get_origin(typing.List) <class 'list'> >>> typing.get_origin(typing.List[int]) <class 'list'>
Notable exceptions being typing.Any
, typing.Union
and typing.ClassVar
… Well, anything that is a typing._SpecialForm
does not define __origin__
. Fortunately:
>>> isinstance(typing.Union, typing._SpecialForm) True >>> isinstance(typing.Union[int, str], typing._SpecialForm) False >>> typing.get_origin(typing.Union[int, str]) typing.Union
But parametrized types define an __args__
attribute that store their parameters as a tuple; Python 3.8 introduce the typing.get_args()
function to retrieve them:
# Python 3.7 >>> typing.Union[int, str].__args__ (<class 'int'>, <class 'str'>) # Python 3.8 >>> typing.get_args(typing.Union[int, str]) (<class 'int'>, <class 'str'>)
So we can improve type checking a bit:
for field_name, field_def in self.__dataclass_fields__.items(): if isinstance(field_def.type, typing._SpecialForm): # No check for typing.Any, typing.Union, typing.ClassVar (without parameters) continue try: actual_type = field_def.type.__origin__ except AttributeError: # In case of non-typing types (such as <class 'int'>, for instance) actual_type = field_def.type # In Python 3.8 one would replace the try/except with # actual_type = typing.get_origin(field_def.type) or field_def.type if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] actual_type = field_def.type.__args__ actual_value = getattr(self, field_name) if not isinstance(actual_value, actual_type): print(f"\t{field_name}: '{type(actual_value)}' instead of '{field_def.type}'") ret = False
This is not perfect as it won't account for typing.ClassVar[typing.Union[int, str]]
or typing.Optional[typing.List[int]]
for instance, but it should get things started.
Next is the way to apply this check.
Instead of using __post_init__
, I would go the decorator route: this could be used on anything with type hints, not only dataclasses
:
import inspect import typing from contextlib import suppress from functools import wraps def enforce_types(callable): spec = inspect.getfullargspec(callable) def check_types(*args, **kwargs): parameters = dict(zip(spec.args, args)) parameters.update(kwargs) for name, value in parameters.items(): with suppress(KeyError): # Assume un-annotated parameters can be any type type_hint = spec.annotations[name] if isinstance(type_hint, typing._SpecialForm): # No check for typing.Any, typing.Union, typing.ClassVar (without parameters) continue try: actual_type = type_hint.__origin__ except AttributeError: # In case of non-typing types (such as <class 'int'>, for instance) actual_type = type_hint # In Python 3.8 one would replace the try/except with # actual_type = typing.get_origin(type_hint) or type_hint if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] actual_type = type_hint.__args__ if not isinstance(value, actual_type): raise TypeError('Unexpected type for \'{}\' (expected {} but found {})'.format(name, type_hint, type(value))) def decorate(func): @wraps(func) def wrapper(*args, **kwargs): check_types(*args, **kwargs) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable)
Usage being:
@enforce_types @dataclasses.dataclass class Point: x: float y: float @enforce_types def foo(bar: typing.Union[int, str]): pass
Appart from validating some type hints as suggested in the previous section, this approach still have some drawbacks:
type hints using strings (class Foo: def __init__(self: 'Foo'): pass
) are not taken into account by inspect.getfullargspec
: you may want to use typing.get_type_hints
and inspect.signature
instead;
a default value which is not the appropriate type is not validated:
@enforce_type def foo(bar: int = None): pass foo()
does not raise any TypeError
. You may want to use inspect.Signature.bind
in conjuction with inspect.BoundArguments.apply_defaults
if you want to account for that (and thus forcing you to define def foo(bar: typing.Optional[int] = None)
);
variable number of arguments can't be validated as you would have to define something like def foo(*args: typing.Sequence, **kwargs: typing.Mapping)
and, as said at the beginning, we can only validate containers and not contained objects.
After this answer got some popularity and a library heavily inspired by it got released, the need to lift the shortcomings mentioned above is becoming a reality. So I played a bit more with the typing
module and will propose a few findings and a new approach here.
For starter, typing
is doing a great job in finding when an argument is optional:
>>> def foo(a: int, b: str, c: typing.List[str] = None): ... pass ... >>> typing.get_type_hints(foo) {'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.Union[typing.List[str], NoneType]}
This is pretty neat and definitely an improvement over inspect.getfullargspec
, so better use that instead as it can also properly handle strings as type hints. But typing.get_type_hints
will bail out for other kind of default values:
>>> def foo(a: int, b: str, c: typing.List[str] = 3): ... pass ... >>> typing.get_type_hints(foo) {'a': <class 'int'>, 'b': <class 'str'>, 'c': typing.List[str]}
So you may still need extra strict checking, even though such cases feels very fishy.
Next is the case of typing
hints used as arguments for typing._SpecialForm
, such as typing.Optional[typing.List[str]]
or typing.Final[typing.Union[typing.Sequence, typing.Mapping]]
. Since the __args__
of these typing._SpecialForm
s is always a tuple, it is possible to recursively find the __origin__
of the hints contained in that tuple. Combined with the above checks, we will then need to filter any typing._SpecialForm
left.
Proposed improvements:
import inspect import typing from functools import wraps def _find_type_origin(type_hint): if isinstance(type_hint, typing._SpecialForm): # case of typing.Any, typing.ClassVar, typing.Final, typing.Literal, # typing.NoReturn, typing.Optional, or typing.Union without parameters return actual_type = typing.get_origin(type_hint) or type_hint # requires Python 3.8 if isinstance(actual_type, typing._SpecialForm): # case of typing.Union[…] or typing.ClassVar[…] or … for origins in map(_find_type_origin, typing.get_args(type_hint)): yield from origins else: yield actual_type def _check_types(parameters, hints): for name, value in parameters.items(): type_hint = hints.get(name, typing.Any) actual_types = tuple(_find_type_origin(type_hint)) if actual_types and not isinstance(value, actual_types): raise TypeError( f"Expected type '{type_hint}' for argument '{name}'" f" but received type '{type(value)}' instead" ) def enforce_types(callable): def decorate(func): hints = typing.get_type_hints(func) signature = inspect.signature(func) @wraps(func) def wrapper(*args, **kwargs): parameters = dict(zip(signature.parameters, args)) parameters.update(kwargs) _check_types(parameters, hints) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable) def enforce_strict_types(callable): def decorate(func): hints = typing.get_type_hints(func) signature = inspect.signature(func) @wraps(func) def wrapper(*args, **kwargs): bound = signature.bind(*args, **kwargs) bound.apply_defaults() parameters = dict(zip(signature.parameters, bound.args)) parameters.update(bound.kwargs) _check_types(parameters, hints) return func(*args, **kwargs) return wrapper if inspect.isclass(callable): callable.__init__ = decorate(callable.__init__) return callable return decorate(callable)
Thanks to @Aran-Fey that helped me improve this answer.
Just found this question.
pydantic can do full type validation for dataclasses out of the box. (admission: I built pydantic)
Just use pydantic's version of the decorator, the resulting dataclass is completely vanilla.
from datetime import datetime from pydantic.dataclasses import dataclass @dataclass class User: id: int name: str = 'John Doe' signup_ts: datetime = None print(User(id=42, signup_ts='2032-06-21T12:00')) """ User(id=42, name='John Doe', signup_ts=datetime.datetime(2032, 6, 21, 12, 0)) """ User(id='not int', signup_ts='2032-06-21T12:00')
The last line will give:
... pydantic.error_wrappers.ValidationError: 1 validation error id value is not a valid integer (type=type_error.integer)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With