I'm storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.
class Person:
def __init__(self, first, last):
self.first = first
self.last = last
p = Person('foo', 'bar')
print(p.last)
...
would be preferable over
p = ['foo', 'bar']
print(p[1])
...
however there seems to be a horrible memory overhead:
l = [Person('foo', 'bar') for i in range(10000000)]
# ipython now taks 1.7 GB RAM
and
del l
l = [('foo', 'bar') for i in range(10000000)]
# now just 118 MB RAM
Why? is there any obvious alternative solution that I didn't think of?
Thanks!
(I know, in this example the 'wrapper' class looks silly. But when the data becomes more complex and nested, it is more useful)
List has a large memory. Tuple is stored in a single block of memory. Creating a tuple is faster than creating a list. Creating a list is slower because two memory blocks need to be accessed.
Key Takeaways and Conclusion The key difference between the tuples and lists is that while the tuples are immutable objects the lists are mutable. This means that tuples cannot be changed while the lists can be modified. Tuples are more memory efficient than the lists.
Tuples are stored in a single block of memory. Tuples are immutable so, It doesn't require extra space to store new objects. Lists are allocated in two blocks: the fixed one with all the Python object information and a variable sized block for the data. It is the reason creating a tuple is faster than List.
The key difference between the tuples and lists is that while the tuples are immutable objects the lists are mutable. This means that tuples cannot be changed while the lists can be modified. Tuples are more memory efficient than the lists.
When you create an object, either class or a new tuple, it uses a lot more memory As indicated in the answers, your tuple example only creates a single tuple object. You should create a test case where you create a lot of different tuples vs custom objects and see how the performance is.
The overhead of an empty tuple is 56 bytes vs. the 72 of a list. Again, this 16 bytes difference per sequence is low-hanging fruit if you have a data structure with a lot of small, immutable sequences. Sets and dictionaries ostensibly don’t grow at all when you add items, but note the enormous overhead.
Every time you create an instance of a class in Python, you are using up some memory–including overhead that might actually be larger than the data you care about. Create a million objects, and you have a million times the overhead.
Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory. While it's true that tuples are constants, that doesn't explain the difference here. [tuple ( ['foo', 'bar']) for i in range (N)] creates N constant (but distinct) tuple objects.
As others have said in their answers, you'll have to generate different objects for the comparison to make sense.
So, let's compare some approaches.
tuple
l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB
class Person
class Person:
def __init__(self, first, last):
self.first = first
self.last = last
l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB
namedtuple
(tuple
+ __slots__
)from collections import namedtuple
Person = namedtuple('Person', 'first last')
l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB
namedtuple
is basically a class that extends tuple
and uses __slots__
for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True
).
class Person
+ __slots__
class Person:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB
This is a trimmed-down version of namedtuple
above. A clear winner, even better than pure tuples.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With