A data class is a list of data set allocation attributes and their values. You cannot assign a data class to an object; however, data class may be used for allocation of a scratch tape to be used to write objects.
A normal class can be defined with or without a parameter in its constructor. Data classes have default implementations for the following methods using only properties that were declared in the primary constructor; toString(), hashCode(), copy(), componentN(), equals().
Data classes are one of the new features of Python 3.7. With data classes, you do not have to write boilerplate code to get proper initialization, representation, and comparisons for your objects. You have seen how to define your own data classes, as well as: How to add default values to the fields in your data class.
The data class provides a copy() method which is used to create a copy (or colon) of object. Using copy() method, some or all properties of object can be altered. For example: data class Product(var item: String, var price: Int)
Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic. Every time you create a class that mostly consists of attributes, you make a data class.
What the dataclasses
module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you.
This is especially useful when your data class must be hashable; because this requires a __hash__
method as well as an __eq__
method. If you add a custom __repr__
method for ease of debugging, that can become quite verbose:
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def __init__(
self,
name: str,
unit_price: float,
quantity_on_hand: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
def __repr__(self) -> str:
return (
'InventoryItem('
f'name={self.name!r}, unit_price={self.unit_price!r}, '
f'quantity_on_hand={self.quantity_on_hand!r})'
def __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity_on_hand))
def __eq__(self, other) -> bool:
if not isinstance(other, InventoryItem):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity_on_hand) ==
(other.name, other.unit_price, other.quantity_on_hand))
With dataclasses
you can reduce it to:
from dataclasses import dataclass
@dataclass(unsafe_hash=True)
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
The same class decorator can also generate comparison methods (__lt__
, __gt__
, etc.) and handle immutability.
namedtuple
classes are also data classes, but are immutable by default (as well as being sequences). dataclasses
are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple
class.
The PEP was inspired by the attrs
project, which can do even more (including slots, validators, converters, metadata, etc.).
If you want to see some examples, I recently used dataclasses
for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.
If you want to use dataclasses
module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs
project mentioned above.
The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.
What exactly are python data classes and when is it best to use them?
namedtuple
and others."mutable namedtuples with default[s]"
Here is what the latter phrase means:
namedtuple
or a regular class.Compared to common classes, you primarily save on typing boilerplate code.
This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).
Here are features you get by default from dataclasses.
Attributes + Representation + Comparison
import dataclasses
@dataclasses.dataclass
#@dataclasses.dataclass() # alternative
class Color:
r : int = 0
g : int = 0
b : int = 0
These defaults are provided by automatically setting the following keywords to True
:
@dataclasses.dataclass(init=True, repr=True, eq=True)
Additional features are available if the appropriate keywords are set to True
.
Order
@dataclasses.dataclass(order=True)
class Color:
r : int = 0
g : int = 0
b : int = 0
The ordering methods are now implemented (overloading operators: < > <= >=
), similarly to functools.total_ordering
with stronger equality tests.
Hashable, Mutable
@dataclasses.dataclass(unsafe_hash=True) # override base `__hash__`
class Color:
...
Although the object is potentially mutable (possibly undesired), a hash is implemented.
Hashable, Immutable
@dataclasses.dataclass(frozen=True) # `eq=True` (default) to be immutable
class Color:
...
A hash is now implemented and changing the object or assigning to attributes is disallowed.
Overall, the object is hashable if either unsafe_hash=True
or frozen=True
.
See also the original hashing logic table with more details.
To get the following features, special methods must be manually implemented:
Unpacking
@dataclasses.dataclass
class Color:
r : int = 0
g : int = 0
b : int = 0
def __iter__(self):
yield from dataclasses.astuple(self)
Optimization
@dataclasses.dataclass
class SlottedColor:
__slots__ = ["r", "b", "g"]
r : int
g : int
b : int
The object size is now reduced:
>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888
In some circumstances, __slots__
also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError
is raised.
See more on slots in this blog post.
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Feature | Keyword | Example | Implement in a Class |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes | init | Color().r -> 0 | __init__ |
| Representation | repr | Color() -> Color(r=0, g=0, b=0) | __repr__ |
| Comparision* | eq | Color() == Color(0, 0, 0) -> True | __eq__ |
| | | | |
| Order | order | sorted([Color(0, 50, 0), Color()]) -> ... | __lt__, __le__, __gt__, __ge__ |
| Hashable | unsafe_hash/frozen | {Color(), {Color()}} -> {Color(r=0, g=0, b=0)} | __hash__ |
| Immutable | frozen + eq | Color().r = 10 -> TypeError | __setattr__, __delattr__ |
| | | | |
| Unpacking+ | - | r, g, b = Color() | __iter__ |
| Optimization+ | - | sys.getsizeof(SlottedColor) -> 888 | __slots__ |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
+These methods are not automatically generated and require manual implementation in a dataclass.
*__ne__
is not needed and thus not implemented.
Post-initialization
@dataclasses.dataclass
class RGBA:
r : int = 0
g : int = 0
b : int = 0
a : float = 1.0
def __post_init__(self):
self.a : int = int(self.a * 255)
RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)
Inheritance
@dataclasses.dataclass
class RGBA(Color):
a : int = 0
Conversions
Convert a dataclass to a tuple or a dict, recursively:
>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}
Limitations
From the PEP specification:
A class decorator is provided which inspects a class definition for variables with type annotations as defined in PEP 526, "Syntax for Variable Annotations". In this document, such variables are called fields. Using these fields, the decorator adds generated method definitions to the class to support instance initialization, a repr, comparison methods, and optionally other methods as described in the Specification section. Such a class is called a Data Class, but there's really nothing special about the class: the decorator adds generated methods to the class and returns the same class it was given.
The @dataclass
generator adds methods to the class that you'd otherwise define yourself like __repr__
, __init__
, __lt__
, and __gt__
.
Consider this simple class Foo
from dataclasses import dataclass
@dataclass
class Foo:
def bar():
pass
Here is the dir()
built-in comparison. On the left-hand side is the Foo
without the @dataclass decorator, and on the right is with the @dataclass decorator.
Here is another diff, after using the inspect
module for comparison.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With