Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python deep getsizeof list with contents?

Tags:

python

memory

I was surprised that sys.getsizeof( 10000*[x] ) is 40036 regardless of x: 0, "a", 1000*"a", {}.
Is there a deep_getsizeof which properly considers elements that share memory ?
(The question came from looking at in-memory database tables like range(1000000) -> province names: list or dict ?)
(Python is 2.6.4 on a mac ppc.)

Added: 10000*["Mississippi"] is 10000 pointers to one "Mississippi", as several people have pointed out. Try this:

nstates = [AlabamatoWyoming() for j in xrange(N)]

where AlabamatoWyoming() -> a string "Alabama" .. "Wyoming". What's deep_getsizeof(nstates) ?
(How can we tell ?

  • a proper deep_getsizeof: difficult, ~ gc tracer
  • estimate from total vm
  • inside knowledge of the python implementation
  • guess.

Added 25jan: see also when-does-python-allocate-new-memory-for-identical-strings

like image 610
denis Avatar asked Jan 22 '10 12:01

denis


People also ask

How to get the data size of a list in Python?

Python has got in-built method – len() to find the size of the list i.e. the length of the list. The len() method accepts an iterable as an argument and it counts and returns the number of elements present in the list.

How to get byte size of list in Python?

# Using Sys Module to getsizeof() import sys # Create and initialize list list1 = [1, 2, 3, 4, 5] list2 = [1, "Programming", "Entechin", 5.3] # Print the sizes of lists print ("Size of list 1 in bytes: ", sys. getsizeof(list1)) print ("Size of list 2 in bytes: ", sys. getsizeof(list2)) .


1 Answers

10000 * [x] will produce a list of 10000 times the same object, so the sizeof is actually closer to correct than you think. However, a deep sizeof is very problematic because it's impossible to tell Python when you want to stop the measurement. Every object references a typeobject. Should the typeobject be counted? What if the reference to the typeobject is the last one, so if you deleted the object the typeobject would go away as well? What about if you have multiple (different) objects in the list refer to the same string object? Should it be counted once, or multiple times?

In short, getting the size of a data structure is very complicated, and sys.getsizeof() should never have been added :S

like image 55
Thomas Wouters Avatar answered Oct 06 '22 04:10

Thomas Wouters