Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding size of class, namedtuple and __slots__ in Python 3.7

Tags:

After watching Nina Zahkarenko's Python Memory Management talk at Pycon2016 (link), it seemed like the dunder method __slots__ was a tool to reduce object size and speed up attribute lookup.

My expectation was that a normal class would be the largest, while a __slots__/namedtuple approach would save space. However, a quick experiment with sys.getsizeof() seems to suggest otherwise:

from collections import namedtuple
from sys import getsizeof

class Rectangle:
   '''A class based Rectangle, with a full __dict__'''
   def __init__(self, x, y, width, height):
      self.x = x
      self.y = y
      self.width = width
      self.height = height

class SlotsRectangle:
   '''A class based Rectangle with __slots__ defined for attributes'''
   __slots__ = ('x', 'y', 'width', 'height')

   def __init__(self, x, y, width, height):
      self.x = x
      self.y = y
      self.width = width
      self.height = height

NamedTupleRectangle = namedtuple('Rectangle', ('x', 'y', 'width', 'height'))
NamedTupleRectangle.__doc__ = 'A rectangle as an immutable namedtuple'

print(f'Class: {getsizeof(Rectangle(1,2,3,4))}')
print(f'Slots: {getsizeof(SlotsRectangle(1,2,3,4))}')
print(f'Named Tuple: {getsizeof(NamedTupleRectangle(1,2,3,4))}')

Terminal Output:

$ python3.7 example.py
Class: 56
Slots: 72
Named Tuple: 80

What is going on here? From the docs on Python's Data Model it appears that descriptors are used for __slots__ which would add function overhead to classes implementing it. However, why are the results so heavily skewed towards a normal class?

Channeling my inner Raymond H.: there has to be a harder way!

like image 851
Matt Avatar asked Mar 18 '19 12:03

Matt


People also ask

What is __ slots __ In Python class?

__slots__ is a class variable. If you have more than one instance of your class, any change made to __slots__ will show up in every instance. You cannot access the memory allocated by the __slots__ declaration by using subscription. You will get only what is currently stored in the list.

What is collections Namedtuple in Python?

Python's namedtuple() is a factory function available in collections . It allows you to create tuple subclasses with named fields. You can access the values in a given named tuple using the dot notation and the field names, like in obj.

Is Namedtuple a class?

NamedTuple: The NamedTuple is a class that contains the data like a dictionary format stored under the 'collections' module.

How do I change Namedtuple value?

Since a named tuple is a tuple, and tuples are immutable, it is impossible to change the value of a field. In this case, we have to use another private method _replace() to replace values of the field. The _replace() method will return a new named tuple.


2 Answers

The function sys.getsizeof() is probably not doing what you think it does; it does not work for complex objects, like custom classes.

Look at this answer for a method to calculate the memory size of objects; maybe it helps you. I copied the code from that answer in here, but the full explanation is in the answer I linked.

import sys
from numbers import Number
from collections import Set, Mapping, deque

try: # Python 2
    zero_depth_bases = (basestring, Number, xrange, bytearray)
    iteritems = 'iteritems'
except NameError: # Python 3
    zero_depth_bases = (str, bytes, Number, range, bytearray)
    iteritems = 'items'

def getsize(obj_0):
    """Recursively iterate to sum size of object & members."""
    _seen_ids = set()
    def inner(obj):
        obj_id = id(obj)
        if obj_id in _seen_ids:
            return 0
        _seen_ids.add(obj_id)
        size = sys.getsizeof(obj)
        if isinstance(obj, zero_depth_bases):
            pass # bypass remaining control flow and return
        elif isinstance(obj, (tuple, list, Set, deque)):
            size += sum(inner(i) for i in obj)
        elif isinstance(obj, Mapping) or hasattr(obj, iteritems):
            size += sum(inner(k) + inner(v) for k, v in getattr(obj, iteritems)())
        # Check for custom object instances - may subclass above too
        if hasattr(obj, '__dict__'):
            size += inner(vars(obj))
        if hasattr(obj, '__slots__'): # can have __slots__ with __dict__
            size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))
        return size
    return inner(obj_0)
like image 70
Ralf Avatar answered Oct 11 '22 01:10

Ralf


There is more compact variant with recordclass library:

from recordclass import dataobject

class Rectangle(dataobject):
   x:int
   y:int
   width:int
   height:int

>>> r = Rectangle(1,2,3,4)
>>> print(sys.getsizeof(r))
48

It has less memory footprint than __slots__-based one because it doesn't participate in cyclic garbage collection (Py_TPFLAGS_HAVE_GC flag doesn't set, so PyGC_Head (24 bytes [<3.8] and 16 bytes [>=3.8]) doesn't need at all).

like image 22
intellimath Avatar answered Oct 10 '22 23:10

intellimath