After watching Nina Zahkarenko's Python Memory Management talk at Pycon2016 (link), it seemed like the dunder method __slots__
was a tool to reduce object size and speed up attribute lookup.
My expectation was that a normal class would be the largest, while a __slots__
/namedtuple
approach would save space. However, a quick experiment with sys.getsizeof()
seems to suggest otherwise:
from collections import namedtuple
from sys import getsizeof
class Rectangle:
'''A class based Rectangle, with a full __dict__'''
def __init__(self, x, y, width, height):
self.x = x
self.y = y
self.width = width
self.height = height
class SlotsRectangle:
'''A class based Rectangle with __slots__ defined for attributes'''
__slots__ = ('x', 'y', 'width', 'height')
def __init__(self, x, y, width, height):
self.x = x
self.y = y
self.width = width
self.height = height
NamedTupleRectangle = namedtuple('Rectangle', ('x', 'y', 'width', 'height'))
NamedTupleRectangle.__doc__ = 'A rectangle as an immutable namedtuple'
print(f'Class: {getsizeof(Rectangle(1,2,3,4))}')
print(f'Slots: {getsizeof(SlotsRectangle(1,2,3,4))}')
print(f'Named Tuple: {getsizeof(NamedTupleRectangle(1,2,3,4))}')
Terminal Output:
$ python3.7 example.py
Class: 56
Slots: 72
Named Tuple: 80
What is going on here? From the docs on Python's Data Model it appears that descriptors are used for __slots__
which would add function overhead to classes implementing it. However, why are the results so heavily skewed towards a normal class?
Channeling my inner Raymond H.: there has to be a harder way!
__slots__ is a class variable. If you have more than one instance of your class, any change made to __slots__ will show up in every instance. You cannot access the memory allocated by the __slots__ declaration by using subscription. You will get only what is currently stored in the list.
Python's namedtuple() is a factory function available in collections . It allows you to create tuple subclasses with named fields. You can access the values in a given named tuple using the dot notation and the field names, like in obj.
NamedTuple: The NamedTuple is a class that contains the data like a dictionary format stored under the 'collections' module.
Since a named tuple is a tuple, and tuples are immutable, it is impossible to change the value of a field. In this case, we have to use another private method _replace() to replace values of the field. The _replace() method will return a new named tuple.
The function sys.getsizeof()
is probably not doing what you think it does; it does not work for complex objects, like custom classes.
Look at this answer for a method to calculate the memory size of objects; maybe it helps you. I copied the code from that answer in here, but the full explanation is in the answer I linked.
import sys
from numbers import Number
from collections import Set, Mapping, deque
try: # Python 2
zero_depth_bases = (basestring, Number, xrange, bytearray)
iteritems = 'iteritems'
except NameError: # Python 3
zero_depth_bases = (str, bytes, Number, range, bytearray)
iteritems = 'items'
def getsize(obj_0):
"""Recursively iterate to sum size of object & members."""
_seen_ids = set()
def inner(obj):
obj_id = id(obj)
if obj_id in _seen_ids:
return 0
_seen_ids.add(obj_id)
size = sys.getsizeof(obj)
if isinstance(obj, zero_depth_bases):
pass # bypass remaining control flow and return
elif isinstance(obj, (tuple, list, Set, deque)):
size += sum(inner(i) for i in obj)
elif isinstance(obj, Mapping) or hasattr(obj, iteritems):
size += sum(inner(k) + inner(v) for k, v in getattr(obj, iteritems)())
# Check for custom object instances - may subclass above too
if hasattr(obj, '__dict__'):
size += inner(vars(obj))
if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
return size
return inner(obj_0)
There is more compact variant with recordclass library:
from recordclass import dataobject
class Rectangle(dataobject):
x:int
y:int
width:int
height:int
>>> r = Rectangle(1,2,3,4)
>>> print(sys.getsizeof(r))
48
It has less memory footprint than __slots__
-based one because it doesn't participate in cyclic garbage collection (Py_TPFLAGS_HAVE_GC flag doesn't set, so PyGC_Head
(24 bytes [<3.8] and 16 bytes [>=3.8]) doesn't need at all).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With