I'm asking this because I found it surprising -- I thought a namedtuple
would have more overhead.
(The background is I was caching a large Django query in memory and found Django objects to be 100x the size of .values()
. I then wondered what overhead namedtuple
versions of the objects would be, allowing me to still use .
access to the items as attributes. Smaller was not what I expected.)
#!/usr/bin/env python
from pympler.asizeof import asizeof
from collections import namedtuple
import random
import string
QTY = 100000
class Foz(object):
pass
dicts = [{'foo': random.randint(0, 10000),
'bar': ''.join([random.choice(string.ascii_letters + string.digits) for n in xrange(32)]),
'baz': random.randrange(10000),
'faz': random.choice([True, False]),
'foz': Foz()} for _ in range(QTY)]
print "%d dicts: %d" % (len(dicts), asizeof(dicts))
# https://stackoverflow.com/questions/43921240/pythonic-way-to-convert-dictionary-to-namedtuple-or-another-hashable-dict-like
MyTuple = namedtuple('MyTuple', sorted(dicts[0]))
tuples = [MyTuple(**d) for d in dicts]
print "%d namedtuples: %d" % (len(tuples), asizeof(tuples))
print "Ratio: %.01f" % (float(asizeof(tuples)) / float(asizeof(dicts)))
Running,
$ ./foo.py
100000 dicts: 75107672
100000 namedtuples: 56707472
Ratio: 0.8
A single tuple is even less, perhaps due to the overhead of list
:
$ ./foo.py
1 dicts: 1072
1 namedtuples: 688
Ratio: 0.6
Is it the overhead of the hashtable array? But wouldn't a namedtuple
also need a hashtable of the attributes? Is pympler
not being accurate?
The basic answer is simply "yes": A normal object has an internal dictionary to store the instance's attributes:
class Foo:
pass
f = Foo()
print(f.__dict__)
# {}
It needs to be a dict because in Python you are allowed to assign new attributes on an instance that were not defined by the class:
f.a = 1
print(f.__dict__)
# {'a': 1}
Using a dict allows fast attribute lookups, but there is memory overhead due to the data structure itself. Also, because different instances of Foo
may have different attributes defined on them, every single instance might need its own dict:
g = Foo()
print(g.__dict__)
# {}
print(f.__dict_ == g.__dict__)
# False
A namedtuple
does not allow adding attributes at runtime. A specific instance of namedtuple
can, therefore, store all of its attributes in a single instance being shared by all instances.
Given a namedtuple
and an instance:
Foo = collections.namedtuple("Foo", 'a,b')
f = Foo(1,2)
The namedtuple
-constructor generates a descriptor for each field and stores it in the class; here is where the translation between named attribute and tuple index is stored. When you access attribute a
on instance f
, the attribute access is routed through this descriptor:
type(Foo.a)
#<class 'property'>
But wouldn't a namedtuple also need a hashtable of the attributes?
Nope. A namedtuple's instance layout is exactly the same as a regular tuple. The mapping from tuple entries to attributes is provided by generated descriptors, which are a mechanism Python provides for controlling attribute resolution. The descriptors are stored in the generated namedtuple type, so they're a per-type cost, not per-instance. Currently, the descriptors are property
objects, as you can see in the current implementation, but that's subject to change (especially if any of this gets rewritten in C).
A namedtuple gets to be way more memory-efficient than a dict because as far as memory layout goes, it's just a tuple.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With