Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python large variable RAM usage

Say there is a dict variable that grows very large during runtime - up into millions of key:value pairs.

Does this variable get stored in RAM, effectively using up all the available memory and slowing down the rest of the system?

Asking the interpreter to display the entire dict is a bad idea, but would it be okay as long as one key is accessed at a time?

like image 615
PPTim Avatar asked Apr 19 '10 18:04

PPTim


People also ask

Why is my Python script using so much memory?

Python does not free memory back to the system immediately after it destroys some object instance. It has some object pools, called arenas, and it takes a while until those are released. In some cases, you may be suffering from memory fragmentation which also causes process' memory usage to grow.

How much memory does a variable take in Python?

A real or decimal number takes up one or two words depending on how it is stored. For example, the text “hello” would take up 5 bytes of storage, one per character. The text “12345” would also require 5 bytes. The integer 12,345 would take up 4 bytes (1 word), as would the integers 1 and 12,345,678.

Are Python variables stored in RAM?

All objects and instance variables are stored in the heap memory. When a variable is created in Python, it is stored in a private heap which will then allow for allocation and deallocation. The heap memory enables these variables to be accessed globally by all your program's methods.


2 Answers

Yes, the dict will be stored in the process memory. So if it gets large enough that there's not enough room in the system RAM, then you can expect to see massive slowdown as the system starts swapping memory to and from disk.

Others have said that a few million items shouldn't pose a problem; I'm not so sure. The dict overhead itself (before counting the memory taken by the keys and values) is significant. For Python 2.6 or later, sys.getsizeof gives some useful information about how much RAM various Python structures take up. Some quick results, from Python 2.6 on a 64-bit OS X machine:

>>> from sys import getsizeof
>>> getsizeof(dict((n, 0) for n in range(5462)))/5462.
144.03368729403149
>>> getsizeof(dict((n, 0) for n in range(5461)))/5461.
36.053470060428495

So the dict overhead varies between 36 bytes per item and 144 bytes per item on this machine (the exact value depending on how full the dictionary's internal hash table is; here 5461 = 2**14//3 is one of the thresholds where the internal hash table is enlarged). And that's before adding the overhead for the dict items themselves; if they're all short strings (6 characters or less, say) then that still adds another >= 80 bytes per item (possibly less if many different keys share the same value).

So it wouldn't take that many million dict items to exhaust RAM on a typical machine.

like image 50
Mark Dickinson Avatar answered Oct 14 '22 21:10

Mark Dickinson


The main concern with the millions of items is not the dictionary itself so much as how much space each of these items takes up. Still, unless you're doing something weird, they should probably fit.

If you've got a dict with millions of keys, though, you're probably doing something wrong. You should do one or both of:

  1. Figure out what data structure you should actually be using, because a single dict is probably not the right answer. Exactly what this would be depends on what you're doing.

  2. Use a database. Your Python should come with a sqlite3 module, so that's a start.

like image 5
kwatford Avatar answered Oct 14 '22 19:10

kwatford