Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping large dictionary in Python affects application performance

Tags:

I'm having some difficulties understanding (and ultimately solving) why having a large dictionary in memory makes creation of other dictionaries longer.

Here's the test code that I'm using

import time
def create_dict():
    # return {x:[x]*125 for x in xrange(0, 100000)}
    return  {x:(x)*125 for x in xrange(0, 100000)}  # UPDATED: to use tuples instead of list of values


class Foo(object):
    @staticmethod
    def dict_init():
        start = time.clock()
        Foo.sample_dict = create_dict()
        print "dict_init in Foo took {0} sec".format(time.clock() - start)

if __name__ == '__main__':
    Foo.dict_init()
    for x in xrange(0, 10):
        start = time.clock()
        create_dict()
        print "Run {0} took {1} seconds".format(x, time.clock() - start)

If I run the code as is (first initializing sample_dict in Foo) and then creating the same dictionary 10 more times in the loop I get the following results:

dict_init in Foo took 0.385263764287 sec
Run 0 took 0.548807949139 seconds
Run 1 took 0.533209452471 seconds
Run 2 took 0.51916067636 seconds
Run 3 took 0.513130722575 seconds
Run 4 took 0.508272050029 seconds
Run 5 took 0.502263872177 seconds
Run 6 took 0.48867601998 seconds
Run 7 took 0.483109299676 seconds
Run 8 took 0.479019713488 seconds
Run 9 took 0.473174195256 seconds
[Finished in 5.6s]

If, however, I do NOT initialize sample_dict in Foo (commenting out Foo.dict_init()) I'm getting almost 20% faster dictionary creation in the loop

Run 0 took 0.431378921359 seconds
Run 1 took 0.423696636179 seconds
Run 2 took 0.419630475616 seconds
Run 3 took 0.405130343806 seconds
Run 4 took 0.398099686921 seconds
Run 5 took 0.392837169802 seconds
Run 6 took 0.38799598399 seconds
Run 7 took 0.375133006408 seconds
Run 8 took 0.368755297573 seconds
Run 9 took 0.363273701371 seconds
[Finished in 4.0s]

I've noticed that if I turn OFF Python's garbage collector by calling gc.disable() performance not only improves ~5x but storing large dictionary in Foo doesn't make a difference. Here are results with garbage collection disabled:

dict_init in Foo took 0.0696136982496 sec
Run 0 took 0.113533445358 seconds
Run 1 took 0.111091241489 seconds
Run 2 took 0.111151620212 seconds
Run 3 took 0.110655722831 seconds
Run 4 took 0.111807537706 seconds
Run 5 took 0.11097510318 seconds
Run 6 took 0.110936170451 seconds
Run 7 took 0.111074414632 seconds
Run 8 took 0.110678488579 seconds
Run 9 took 0.111011066463 seconds

So I have 2 questions:

  • Why does disabling garbage collection speeds up dictionary creation
  • How to achieve even performance (with Foo init and w/o) without disable garbage collection

I would appreciate any insight on this.

Thank you!

UPDATED: After Tim Peters mentioned that I'm creating mutable objects, I've decided to try to create immutable objects (tuples in my case) and voila - much, much faster results (same with using gc and without)

dict_init in Foo took 0.017769 sec
Run 0 took 0.017547 seconds
Run 1 took 0.013234 seconds
Run 2 took 0.012791 seconds
Run 3 took 0.013371 seconds
Run 4 took 0.013288 seconds
Run 5 took 0.013692 seconds
Run 6 took 0.013059 seconds
Run 7 took 0.013311 seconds
Run 8 took 0.013343 seconds
Run 9 took 0.013675 seconds

I understand that tuple creation is much faster than list creation but why does having a dictionary of immutable objects doesn't affect time spent by garbage collection? Are immutable objects not involved in reference cycle?

Thank you.

P.S. As it happens, in my real-world scenario, converting list to tuples solved the problem (there was never a need to have lists, just didn't think of using tuples) but I'm still curious as to why it's faster.

like image 411
barmaley Avatar asked Oct 15 '13 21:10

barmaley


People also ask

How big can a dictionary be in Python?

It will not display the output because the computer ran out of memory before reaching 2^27. So there is no size limitation in the dictionary.

Is Python dictionary slow?

Python is slow. I bet you might encounter this counterargument many times about using Python, especially from people who come from C or C++ or Java world. This is true in many cases, for instance, looping over or sorting Python arrays, lists, or dictionaries can be sometimes slow.

Are Python dictionaries efficient?

Space-time tradeoff. The fastest way to repeatedly lookup data with millions of entries in Python is using dictionaries. Because dictionaries are the built-in mapping type in Python thereby they are highly optimized.

How much memory does Python dictionary use?

In other words, our dictionary, with nothing in it at all, consumes 240 bytes. Not bad; given how often dictionaries are used in Python, it's good to know that they don't normally consume that much memory.


2 Answers

"Dictionary creation" is really a red herring here. What the dictionary creation does in this case that's relevant is that it creates a hundred thousand new 125-element lists. Because lists can be involved in reference cycles, that creates 12.5 million list elements CPython's cyclic garbage collection has to examine whenever it scans a generation containing a dict. It doesn't matter that these lists are in dictionaries, it only matters that they exist.

So what you're timing is largely the time consumed by Python's cyclic garbage collection. It doesn't particularly matter that you keep on creating more dicts, it only matters that you're creating new mutable objects (which can be involved in cycles) much faster than you're destroying old mutable objects. That (an excess of allocations over deallocations) is what triggers CPython's cyclic gc).

Not much you can do about it, alas. Programs that go through well-delineated phases of creating mounds of new objects can benefit by disabling cyclic gc for the duration. Can't guess whether that applies to you, though.

Ah, forgot one: the dict in Foo makes such a big difference because Foo "sticks around". All the other dicts you create are thrown away immediately after being constructed (CPython's reference counting is responsible for that), so don't add to the time consumed by cyclic gc. But the dict in Foo does, because that dict doesn't go away. Change your loop to append the new dicts to a list, and you'll see that the time goes up for each dict, then drops a lot, then goes up again, etc. That reflects dicts moving to older generations inside Python's cyclic gc, so getting scanned by cyclic gc less frequently. It gets complicated ;-)

More details?

It's hard to be more precise, because the performance of cyclic gc in specific cases depends on mountains of implementation details that can - and do - change across releases.

The general advice to use "immutable objects" when possible is based on a rather deep ;-) understanding of how cyclic gc is implemented in CPython, and how it's evolved over the years.

It so happens that today (most recent versions of Python 2 and Python 3), strong efforts are made to exempt certain tuples and dicts from cyclic gc. That may change. Special-casing such things isn't free, so deciding whether to add more tricks like this is always a difficult balancing act. It's an easier decision for immutable objects, hence the advice to move towards those.

Tuples and dicts weren't special-cased at all until very late 2008, as detailed in this from the Python issue tracker.

And - surprise ;-) - that turned out to have horrible performance consequences in some rare cases, which were fixed by more special-casing in this issue in mid 2012.

Some good news is that a gc.is_tracked() function was added so you can at least investigate whether cyclic gc is going to scan a specific object. Here are some examples under Python 2.7.5. There's no guarantee they'll always work this way:

"Scalar" objects (no internal pointers) are never tracked - it's impossible for them to be in a cycle:

>>> import gc
>>> gc.is_tracked(4)
False
>>> gc.is_tracked("2323")
False

Tuples are initially tracked:

>>> t1 = ([1],)
>>> t2 = ((1.),)
>>> gc.is_tracked(t1), gc.is_tracked(t2)
(True, True)

However, if cyclic gc runs and determines that a tuple is immutable "all the way down", it untracks that tuple:

>>> gc.collect()
0
>>> gc.is_tracked(t1), gc.is_tracked(t2)
(True, False)

There's nothing that can be done to t2 to make it participate in a cycle, because it, and all its components (on & on, all the way down) are immutable. But t1 still needs to be tracked! Because t1[0] is mutable, t1 may be part of a cycle later:

>>> t1
([1],)
>>> t1[0][0] = t1
>>> t1
([([...],)],)

A different policy happens to be used for dicts. They're created untracked, if possible:

>>> d = {1: [2]}
>>> gc.is_tracked(d)
True

Because that dict has a mutable value, it could become part of a cycle later, so must be tracked by cyclic gc.

>>> d[1][0] = d
>>> d
{1: [{...}]}

But a dict with untracked keys and values is created untracked:

>>> d = {1: 2}
>>> gc.is_tracked(d)
False

This is tricky, because such a dict can still become part of a cycle later!

>>> d[2] = d
>>> gc.is_tracked(d)
True

It's not free to detect such changes.

Then there are combinations of the above:

>>> d = {(1, 2): (4, "abc", 5)}
>>> gc.is_tracked(d)
True
>>> gc.collect()
3
>>> gc.is_tracked(d)
False

In this case, d is tracked at first, because its keys and values (the tuples) are created tracked at first. But after cyclic gc runs the first time, it determines that the keys and values are "immutable all the way down", so untracks the dict.

Like I said, it gets complicated ;-)

BTW,

I understand that tuple creation is much faster than list creation

It should be only slightly slower to create a list. Lists and tuples have very similar implementations in CPython. tuples require a little less memory (which can be significant, in percentage terms, for very short sequences), and it can be a little faster to index a tuple than a list. But creation time? It's the difference between one malloc()-like function (for a tuple) versus two (for a list), independent of the number of elements. That can be significant for very short sequences, but not for large ones.

like image 86
Tim Peters Avatar answered Sep 20 '22 19:09

Tim Peters


Modify program like this to inspect bytecode :

import time
import dis
import inspect

def create_dict():
    return {x:[x]*125 for x in xrange(0, 100000)}


class Foo(object):
    @staticmethod
    def dict_init():
        start = time.clock()
        Foo.sample_dict = create_dict()
        print "dict_init in Foo took {0} sec".format(time.clock() - start)
        dis.dis(inspect.currentframe().f_code)

if __name__ == '__main__':
    Foo.dict_init()
    for x in xrange(0, 1):
        start = time.clock()
        create_dict()
        print "Run {0} took {1} seconds".format(x, time.clock() - start)
        dis.dis(inspect.currentframe().f_code)

Here is the output :

dict_init in Foo took 0.44164 sec
 12           0 LOAD_GLOBAL              0 (time)
              3 LOAD_ATTR                1 (clock)
              6 CALL_FUNCTION            0
              9 STORE_FAST               0 (start)

 13          12 LOAD_GLOBAL              2 (create_dict)
             15 CALL_FUNCTION            0
             18 LOAD_GLOBAL              3 (Foo)
             21 STORE_ATTR               4 (sample_dict)

 14          24 LOAD_CONST               1 ('dict_init in Foo took {0} sec')
             27 LOAD_ATTR                5 (format)
             30 LOAD_GLOBAL              0 (time)
             33 LOAD_ATTR                1 (clock)
             36 CALL_FUNCTION            0
             39 LOAD_FAST                0 (start)
             42 BINARY_SUBTRACT     
             43 CALL_FUNCTION            1
             46 PRINT_ITEM          
             47 PRINT_NEWLINE       

 15          48 LOAD_GLOBAL              6 (dis)
             51 LOAD_ATTR                6 (dis)
             54 LOAD_GLOBAL              7 (inspect)
             57 LOAD_ATTR                8 (currentframe)
             60 CALL_FUNCTION            0
             63 LOAD_ATTR                9 (f_code)
             66 CALL_FUNCTION            1
             69 POP_TOP             
             70 LOAD_CONST               0 (None)
             73 RETURN_VALUE        
Run 0 took 0.641144 seconds
  1           0 LOAD_CONST               0 (-1)
              3 LOAD_CONST               1 (None)
              6 IMPORT_NAME              0 (time)
              9 STORE_NAME               0 (time)

  2          12 LOAD_CONST               0 (-1)
             15 LOAD_CONST               1 (None)
             18 IMPORT_NAME              1 (dis)
             21 STORE_NAME               1 (dis)

  3          24 LOAD_CONST               0 (-1)
             27 LOAD_CONST               1 (None)
             30 IMPORT_NAME              2 (inspect)
             33 STORE_NAME               2 (inspect)

  5          36 LOAD_CONST               2 (<code object create_dict at 0x1091396b0, file "dict.py", line 5>)
             39 MAKE_FUNCTION            0
             42 STORE_NAME               3 (create_dict)

  9          45 LOAD_CONST               3 ('Foo')
             48 LOAD_NAME                4 (object)
             51 BUILD_TUPLE              1
             54 LOAD_CONST               4 (<code object Foo at 0x10914c130, file "dict.py", line 9>)
             57 MAKE_FUNCTION            0
             60 CALL_FUNCTION            0
             63 BUILD_CLASS         
             64 STORE_NAME               5 (Foo)

 17          67 LOAD_NAME                6 (__name__)
             70 LOAD_CONST               5 ('__main__')
             73 COMPARE_OP               2 (==)
             76 POP_JUMP_IF_FALSE      186

 18          79 LOAD_NAME                5 (Foo)
             82 LOAD_ATTR                7 (dict_init)
             85 CALL_FUNCTION            0
             88 POP_TOP             

 19          89 SETUP_LOOP              94 (to 186)
             92 LOAD_NAME                8 (xrange)
             95 LOAD_CONST               6 (0)
             98 LOAD_CONST               7 (1)
            101 CALL_FUNCTION            2
            104 GET_ITER            
        >>  105 FOR_ITER                74 (to 182)
            108 STORE_NAME               9 (x)

 20         111 LOAD_NAME                0 (time)
            114 LOAD_ATTR               10 (clock)
            117 CALL_FUNCTION            0
            120 STORE_NAME              11 (start)

 21         123 LOAD_NAME                3 (create_dict)
            126 CALL_FUNCTION            0
            129 POP_TOP             

 22         130 LOAD_CONST               8 ('Run {0} took {1} seconds')
            133 LOAD_ATTR               12 (format)
            136 LOAD_NAME                9 (x)
            139 LOAD_NAME                0 (time)
            142 LOAD_ATTR               10 (clock)
            145 CALL_FUNCTION            0
            148 LOAD_NAME               11 (start)
            151 BINARY_SUBTRACT     
            152 CALL_FUNCTION            2
            155 PRINT_ITEM          
            156 PRINT_NEWLINE       

 23         157 LOAD_NAME                1 (dis)
            160 LOAD_ATTR                1 (dis)
            163 LOAD_NAME                2 (inspect)
            166 LOAD_ATTR               13 (currentframe)
            169 CALL_FUNCTION            0
            172 LOAD_ATTR               14 (f_code)
            175 CALL_FUNCTION            1
            178 POP_TOP             
            179 JUMP_ABSOLUTE          105
        >>  182 POP_BLOCK           
            183 JUMP_FORWARD             0 (to 186)
        >>  186 LOAD_CONST               1 (None)
            189 RETURN_VALUE        

Maybe it is the difference in the format of the string that is causing the difference when garbage collection is off.

like image 33
Emil Davtyan Avatar answered Sep 18 '22 19:09

Emil Davtyan