Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: class vs tuple huge memory overhead (?)

I'm storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

p = Person('foo', 'bar')
print(p.last)
...

would be preferable over

p = ['foo', 'bar']
print(p[1])
...

however there seems to be a horrible memory overhead:

l = [Person('foo', 'bar') for i in range(10000000)]
# ipython now taks 1.7 GB RAM

and

del l
l = [('foo', 'bar') for i in range(10000000)]
# now just 118 MB RAM

Why? is there any obvious alternative solution that I didn't think of?

Thanks!

(I know, in this example the 'wrapper' class looks silly. But when the data becomes more complex and nested, it is more useful)

like image 363
seb314 Avatar asked Jul 15 '17 22:07

seb314


People also ask

Which takes more memory list or tuple?

List has a large memory. Tuple is stored in a single block of memory. Creating a tuple is faster than creating a list. Creating a list is slower because two memory blocks need to be accessed.

Are tuples more memory efficient?

Key Takeaways and Conclusion The key difference between the tuples and lists is that while the tuples are immutable objects the lists are mutable. This means that tuples cannot be changed while the lists can be modified. Tuples are more memory efficient than the lists.

Why list consume more memory than tuple in Python?

Tuples are stored in a single block of memory. Tuples are immutable so, It doesn't require extra space to store new objects. Lists are allocated in two blocks: the fixed one with all the Python object information and a variable sized block for the data. It is the reason creating a tuple is faster than List.

Why would you use a tuple over a list?

The key difference between the tuples and lists is that while the tuples are immutable objects the lists are mutable. This means that tuples cannot be changed while the lists can be modified. Tuples are more memory efficient than the lists.

Why do tuples use so much memory in Python?

When you create an object, either class or a new tuple, it uses a lot more memory As indicated in the answers, your tuple example only creates a single tuple object. You should create a test case where you create a lot of different tuples vs custom objects and see how the performance is.

What is the overhead of an empty tuple in Python?

The overhead of an empty tuple is 56 bytes vs. the 72 of a list. Again, this 16 bytes difference per sequence is low-hanging fruit if you have a data structure with a lot of small, immutable sequences. Sets and dictionaries ostensibly don’t grow at all when you add items, but note the enormous overhead.

Is your Python class using too much memory?

Every time you create an instance of a class in Python, you are using up some memory–including overhead that might actually be larger than the data you care about. Create a million objects, and you have a million times the overhead.

What is the difference between namedtuples and class tuples?

Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory. While it's true that tuples are constants, that doesn't explain the difference here. [tuple ( ['foo', 'bar']) for i in range (N)] creates N constant (but distinct) tuple objects.


1 Answers

As others have said in their answers, you'll have to generate different objects for the comparison to make sense.

So, let's compare some approaches.

tuple

l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB

class Person

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB

namedtuple (tuple + __slots__)

from collections import namedtuple
Person = namedtuple('Person', 'first last')

l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True).

class Person + __slots__

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB

This is a trimmed-down version of namedtuple above. A clear winner, even better than pure tuples.

like image 131
randomir Avatar answered Oct 04 '22 09:10

randomir