Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are type annotations needed in Python dataclasses?

In opposition to standard classes in Python dataclasses fields must have a type annotation. But I don't understand what the purpose of these type annotations really is. One can create a dataclass like this

from dataclasses import dataclass, fields

@dataclass
class Broken:
    field1: str = "default_string1",
    field2: "" = "default_string2",
    field3 = "default_string3"

And this class will be accepted by the python interpreter (and your IDE/static code checker might also not see anything wrong with it). However if you use the class

b = Broken()

print("Wannabe type for field1:", fields(b)[0].type)
print("Real type for field1:", type(b.field1))

print("Wannabe type for field2:", fields(b)[1].type)
print("Real type for field2:", type(b.field2))

try:
    print("Wannabe type for field3:", fields(b)[2].type)
except:
    print("field3 is not part of fields(b)")
print("Real type for field3:", type(b.field3))

you will get some surprising output

Wannabe type for field1: <class 'str'>
Real type for field1: <class 'tuple'>

Did you notice the trailing comma after the default value of field1? So the type annotation is not used to check, that the default value has the correct type, so the real type is tuple instead of the type str used in the annotation.

Wannabe type for field2: 
Real type for field2: <class 'tuple'>

You can even use an empty string as type annotation and Python won't raise an eyebrow.

field3 is not part of fields(b)
Real type for field3: <class 'str'>

If you leave away the type annotation completely, then the field will not be shown when calling the fields() function. However it can still be accessed via the object.

So what is the purpose of the type annotation for dataclasses? Are they really just used to check, if a field should be listed with the fields() function? Or did the Python maintainers anticipate future functionality for types in dataclasses, that wasn't implemented, yet? Why do I need to add type annotations to dataclasses in Python?

like image 362
asmaier Avatar asked Jul 02 '26 20:07

asmaier


2 Answers

There is nothing too mysterious going on here. The documentation clearly says

The member variables [...] are defined using PEP 526 type annotations. [...] A field is defined as a class variable that has a type annotation.

To review your example point by point:

  • field1 is defined to be of type str, but has a default value of type tuple. The @dataclasses-decorator (or the underlying module as a whole) simply make no assumptions and give no guarantees about the type of the default value (("default_string1", )) not matching what the type of the field should be, as the type-annotation is almost completely ignored:

    With two exceptions described below, nothing in @dataclass examines the type specified in the variable annotation.

    Those exceptions deal with class-variables and init-only variables; other than a warning by a third-party type-checker, it's up to Python's duck-typing mechanism.

  • The type of field2: "" is <class 'str'>, unlike field2: str, which would correctly result in field2's type being of <class 'type'>. As with field1, while this makes no sense to any type-checker, the dataclasses-module does not require the type-annotation to make sense; it's as simple as that.

  • field3 has no type-annotation, so it is ignored by the dataclass-mechanism. But this does not mean the field is ignored by the class itself. field3 is simply a normal class-variable that can be accessed on any instance of Broken or on the class itself.

What all fields set with PEP 526 type annotations give you (unless ignored by @dataclass if explicitly told so) is methods. In your example above, only field3 is ignored by @dataclass. Accordingly, this field is not part of the provided __init__() (you can't instantiate a Broken(field3="foo")), it's not part of the provided __repr__(), __hash__() or any other method.

In other words, the type-annotation is simply a marriage of convenience: The @dataclass-decorator only cares about type-annotated class variables, and uses type-annotations for it's own good while doing so (see the KW_ONLY pseudo-type). Beyond that, normal Python duck-typing rules apply: You are free to lie about your duck as long as it quacks properly.

like image 128
user2722968 Avatar answered Jul 04 '26 09:07

user2722968


So, one key thing to understand. A "data class" isn't a different kind of class. The dataclasses.dataclass decorator is a code generator to generate various boilerplate methods that produces a type definition that is the same as you could make manually without the decorator. The whole point is to avoid boilerplate, and the annotation syntax is chosen because it is expressive and succinct. That's it.

If you don't want to use annotations, you can just write a normal class definition statement with no annotations and it will function exactly the same as a class that was generated with the dataclasses.dataclass code generator, of course, you would be writing a bunch of boilerplate.

like image 37
juanpa.arrivillaga Avatar answered Jul 04 '26 10:07

juanpa.arrivillaga