Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: size of strings in memory

Consider the following code:

arr = []
for (str, id, flag) in some_data:
    arr.append((str, id, flag))

Imagine the input strings being 2 chars long in average and 5 chars max and some_data having 1 million elements. What will the memory requirement of such a structure be?

May it be that a lot of memory is wasted for the strings? If so, how can I avoid that?

like image 360
didi_X8 Avatar asked Feb 25 '12 15:02

didi_X8


People also ask

How does Python store strings in memory?

A string in Python is just a sequence of Unicode characters enclosed within quotes. Remember that in Python there can be single quotes, double quotes, or even triple single or triple double quotes. When it comes to Python, strings are extremely efficient in terms of memory cost.

What is the memory size of string?

java.lang.String An empty String takes 40 bytes—enough memory to fit 20 Java characters.

How do you check the memory size of a variable in Python?

In python, the usage of sys. getsizeof() can be done to find the storage size of a particular object that occupies some space in the memory. This function returns the size of the object in bytes.


2 Answers

In this case, because the strings are quite short, and there are so many of them, you stand to save a fair bit of memory by using intern on the strings. Assuming there are only lowercase letters in the strings, that's 26 * 26 = 676 possible strings, so there must be a lot of repetitions in this list; intern will ensure that those repetitions don't result in unique objects, but all refer to the same base object.

It's possible that Python already interns short strings; but looking at a number of different sources, it seems this is highly implementation-dependent. So calling intern in this case is probably the way to go; YMMV.

As an elaboration on why this is very likely to save memory, consider the following:

>>> sys.getsizeof('')
40
>>> sys.getsizeof('a')
41
>>> sys.getsizeof('ab')
42
>>> sys.getsizeof('abc')
43

Adding single characters to a string adds only a byte to the size of the string itself, but every string takes up 40 bytes on its own.

like image 184
senderle Avatar answered Oct 04 '22 08:10

senderle


If your strings are so short, it is likely there will be a significant number of duplicates. Python interning will optimise it so that these strings are stored only once and the reference used multiple tiems, rather than storing the string multiple times...

These strings should be automatically interned as there are.

like image 26
Karl Barker Avatar answered Oct 04 '22 09:10

Karl Barker