Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way of deleting certain keys from dict in Python

I'm looking for most fastest/effective way of deleting certain keys in a python dict

Here are some options

for k in somedict.keys(): 
    if k.startswith("someprefix"): 
        del somedict[k]

or

dict((k, v) for (k, v) in somedict.iteritems() if not k.startswith('someprefix'))

Logically first snippet should be faster on smaller dicts, it doesn't create a copy of a dict but creates a list of all keys, however double lookups and dict rebuilding is time consuming. While the second is faster on bigger dicts, but requires 2x more memory. I've checked my assumption in some small benchmark.

Anything faster?

like image 599
HardQuestions Avatar asked Jun 19 '10 20:06

HardQuestions


2 Answers

Not only is del more easily understood, but it seems slightly faster than pop():

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "for k in d.keys():" "  if k.startswith('f'):" "    del d[k]"
1000000 loops, best of 3: 0.733 usec per loop

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "for k in d.keys():" "  if k.startswith('f'):" "    d.pop(k)"
1000000 loops, best of 3: 0.742 usec per loop

Edit: thanks to Alex Martelli for providing instructions on how to do this benchmarking. Hopefully I have not slipped up anywhere.

First measure the time required for copying:

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()"
1000000 loops, best of 3: 0.278 usec per loop

Benchmark on copied dict:

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()" "for k in d1.keys():" "  if k.startswith('f'):" "    del d1[k]"
100000 loops, best of 3: 1.95 usec per loop

$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()" "for k in d1.keys():" "  if k.startswith('f'):" "    d1.pop(k)"
100000 loops, best of 3: 2.15 usec per loop

Subtracting the cost of copying, we get 1.872 usec for pop() and 1.672 for del.

like image 175
mechanical_meat Avatar answered Oct 17 '22 10:10

mechanical_meat


If the dict is large enough, it may make sense to generate a whole new dict instead.

dict((k, v) for (k, v) in somedict.iteritems() if not k.startswith('someprefix'))
like image 45
Ignacio Vazquez-Abrams Avatar answered Oct 17 '22 10:10

Ignacio Vazquez-Abrams