Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which takes less memory, a frozenset or a tuple?

I have an object which needs to be "tagged" with 0-3 strings (out of a set of 20-some possibilities); these values are all unique and order doesn't matter. The only operation that needs to be done on the tags is checking if a particular one is present or not (specific_value in self.tags).

However, there's an enormous number of these objects in memory at once, to the point that it pushes the limits of my old computer's RAM. So saving a few bytes can add up.

With so few tags on each object, I doubt the lookup time is going to matter much. But: is there a memory difference between using a tuple and a frozenset here? Is there any other real reason to use one over the other?

like image 544
Draconis Avatar asked Jul 07 '19 04:07

Draconis


People also ask

Which takes more memory list or tuple?

List has a large memory. Tuple is stored in a single block of memory. Creating a tuple is faster than creating a list. Creating a list is slower because two memory blocks need to be accessed.

How much memory does a tuple take?

Memory and Performance of Tuples vs Lists Here you can see that the tuple takes only 48 bytes, whereas the list requires 104. As a tuple cannot be modified, its size remains the same.

Does tuple consume less memory?

Tuples are stored in a single block of memory. Tuples are immutable so, It doesn't require extra space to store new objects. Lists are allocated in two blocks: the fixed one with all the Python object information and a variable sized block for the data. It is the reason creating a tuple is faster than List.

What is the difference between tuple and Frozenset?

tuples are immutable lists , frozensets are immutable sets . frozensets aren't indexed, but you have the functionality of sets - O(1) element lookups, and functionality such as unions and intersections. They also can't contain duplicates, like their mutable counterparts.

Why do list s need 16 bytes more memory than tuple S?

So list s need at least 16 bytes more memory than tuple s. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:

What happens when a Python tuple has less than 20 items?

If a tuple is no longer needed and has less than 20 items, instead of deleting it permanently, Python moves it to a free list and uses it later. Note: Though it’s not certain always, the same memory will be used.

How to create list and tuple from frozenset object in Python?

Answer: You can easily create a list and tuple from frozenset object using built-in functions. see below example of frozenset to list and tuple:- Using for loop to iterate through frozen set elements.

What is the difference between set and frozenset?

In Python, frozenset is the same as set except the frozensets are immutable which means that elements from the frozenset cannot be added or removed once created. This function takes input as any iterable object and converts them into an immutable object. The order of elements is not guaranteed to be preserved.


3 Answers

Tuples are very compact. Sets are based on hash tables, and depend on having "empty" slots to make hash collisions less likely.

For a recent enough version of CPython, sys._debugmallocstats() displays lots of potentially interesting info. Here under a 64-bit Python 3.7.3:

>>> from sys import _debugmallocstats as d
>>> tups = [tuple("abc") for i in range(1000000)]

tuple("abc") creates a tuple of 3 1-character strings, ('a', 'b', 'c'). Here I'll edit out almost all the output:

>>> d()
Small block threshold = 512, in 64 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
...
    8     72       17941         1004692             4

Since we created a million tuples, it's a very good bet that the size class using 1004692 blocks is the one we want ;-) Each of the blocks consumes 72 bytes.

Switching to frozensets instead, the output shows that those consume 224 bytes each, a bit over 3x more:

>>> tups = [frozenset(t) for t in tups]
>>> d()
Small block threshold = 512, in 64 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
...
   27    224       55561         1000092             6

In this particular case, the other answer you got happens to give the same results:

>>> import sys
>>> sys.getsizeof(tuple("abc"))
72
>>> sys.getsizeof(frozenset(tuple("abc")))
224

While that's often true, it's not always so, because an object may require allocating more bytes than it actually needs, to satisfy HW alignment requirements. getsizeof() doesn't know anything about that, but _debugmallocstats() shows the number of bytes Python's small-object allocator actually needs to use.

For example,

>>> sys.getsizeof("a")
50

On a 32-bit box, 52 bytes actually need to be used, to provide 4-byte alignment. On a 64-bit box, 8-byte alignment is currently required, so 56 bytes need to be used. Under Python 3.8 (not yet released), on a 64-bit box 16-byte alignment is required, and 64 bytes will need to be used.

But ignoring all that, a tuple will always need less memory than any form of set with the same number of elements - and even less than a list with the same number of elements.

like image 184
Tim Peters Avatar answered Oct 31 '22 18:10

Tim Peters


sys.getsizeof seems like the stdlib option you want... but I feel queasy about your whole use case

import sys
t = ("foo", "bar", "baz")
f = frozenset(("foo","bar","baz"))
print(sys.getsizeof(t))
print(sys.getsizeof(f))

https://docs.python.org/3.7/library/sys.html#sys.getsizeof

All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

...So don't get comfy with this solution

EDIT: Obviously @TimPeters answer is more correct...

like image 20
Charles Landau Avatar answered Oct 31 '22 16:10

Charles Landau


If you're trying to save memory, consider

  • Trading off some elegance for some memory savings by extracting the data structure of which tags are present into an external (singleton) data structure
  • Using a "flags" (bitmap) type approach, where each tag is mapped to a bit of a 32-bit integer. Then all you need is a (singleton) dict mapping from the object (identity) to a 32-bit integer (flags). If no flags are present, no entry in the dictionary.
like image 42
eddiewould Avatar answered Oct 31 '22 16:10

eddiewould