Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 - Which one is faster for accessing data: dataclasses or dictionaries?

Python 3.7 introduced dataclasses to store data. I'm considering to move to this new approach which is more organized and well structured than a dict.

But I have a doubt. Python transforms keys into hashes on dicts and that makes looking for keys and values much faster. Dataclasses implement something like it?

Which one is faster and why?

like image 663
sergiomafra Avatar asked Mar 19 '19 19:03

sergiomafra


People also ask

Are dictionaries or lists faster for lookups?

Lookups are faster in dictionaries because Python implements them using hash tables. If we explain the difference by Big O concepts, dictionaries have constant time complexity, O(1) while lists have linear time complexity, O(n).

Are DataClasses fast?

DataClass is slower than others while creating data objects (2.94 µs). NamedTuple is the faster one while creating data objects (2.01 µs). An object is slower than DataClass but faster than NamedTuple while creating data objects (2.34 µs).

Why is dictionary faster than list?

The reason is because a dictionary is a lookup, while a list is an iteration. Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.

Are Python dictionaries slow?

Python is slow. I bet you might encounter this counterargument many times about using Python, especially from people who come from C or C++ or Java world. This is true in many cases, for instance, looping over or sorting Python arrays, lists, or dictionaries can be sometimes slow.

Are dictionaries fast?

The reason is dictionaries are very fast, implemented using a technique called hashing, which allows us to access a value very quickly. By contrast, the list of tuples implementation is slow. If we wanted to find a value associated with a key, we would have to iterate over every tuple, checking the 0th element.


1 Answers

All classes in python actually use a dictionary under the hood to store their attributes, as you can read here in the documentation. For a more in-depth reference on how python classes (and many more things) work, you can also check out the article on python's datamodel, in particular the section on custom classes.

So in general, there shouldn't be a loss in performance by moving from dictionaries to dataclasses. But it's better to make sure with the timeit module:


Baseline

# dictionary creation
$ python -m timeit "{'var': 1}"
5000000 loops, best of 5: 52.9 nsec per loop

# dictionary key access
$ python -m timeit -s "d = {'var': 1}" "d['var']"
10000000 loops, best of 5: 20.3 nsec per loop

Basic dataclass

# dataclass creation
$ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: var: int" "A(1)" 
1000000 loops, best of 5: 288 nsec per loop

# dataclass attribute access
$ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: var: int" -s "a = A(1)" "a.var" 
10000000 loops, best of 5: 25.3 nsec per loop

Here we can see that using classes does have some overhead. For class creation it's quite a bit (~5 times slower), but you don't necessarily need to care that much about it as long as you don't plan to create and toss your dataclasses multiple times per second.

The attribute access is probably the more important metric, and while dataclasses are again slower (~1.25 times), this time it's not by that much.

If you think that's still a tad too slow, you can tune your dataclass (or any classes, really) by using slots instead of a dictionary to store their attributes:


Slotted dataclass

# dataclass creation
$ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: __slots__ = ('var',); var: int" "A(1)" 
1000000 loops, best of 5: 242 nsec per loop

# dataclass attribute access
$ python -m timeit -s "from dataclasses import dataclass" -s "@dataclass" -s "class A: __slots__ = ('var',); var: int" -s "a = A(1)" "a.var"
10000000 loops, best of 5: 21.7 nsec per loop

By using this pattern we could shave off a few more more nanoseconds. At this point, at least regarding attribute access, there shouldn't be a noticeable difference to dictionaries any more, and you can use the upsides of dataclasses without compromising speed.

like image 181
Arne Avatar answered Oct 09 '22 06:10

Arne