I'm looking for most fastest/effective way of deleting certain keys in a python dict
Here are some options
for k in somedict.keys():
if k.startswith("someprefix"):
del somedict[k]
or
dict((k, v) for (k, v) in somedict.iteritems() if not k.startswith('someprefix'))
Logically first snippet should be faster on smaller dicts, it doesn't create a copy of a dict but creates a list of all keys, however double lookups and dict rebuilding is time consuming. While the second is faster on bigger dicts, but requires 2x more memory. I've checked my assumption in some small benchmark.
Anything faster?
Not only is del
more easily understood, but it seems slightly faster than pop():
$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "for k in d.keys():" " if k.startswith('f'):" " del d[k]"
1000000 loops, best of 3: 0.733 usec per loop
$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "for k in d.keys():" " if k.startswith('f'):" " d.pop(k)"
1000000 loops, best of 3: 0.742 usec per loop
Edit: thanks to Alex Martelli for providing instructions on how to do this benchmarking. Hopefully I have not slipped up anywhere.
First measure the time required for copying:
$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()"
1000000 loops, best of 3: 0.278 usec per loop
Benchmark on copied dict:
$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()" "for k in d1.keys():" " if k.startswith('f'):" " del d1[k]"
100000 loops, best of 3: 1.95 usec per loop
$ python -m timeit -s "d = {'f':1,'foo':2,'bar':3}" "d1 = d.copy()" "for k in d1.keys():" " if k.startswith('f'):" " d1.pop(k)"
100000 loops, best of 3: 2.15 usec per loop
Subtracting the cost of copying, we get 1.872 usec for pop()
and 1.672 for del
.
If the dict is large enough, it may make sense to generate a whole new dict instead.
dict((k, v) for (k, v) in somedict.iteritems() if not k.startswith('someprefix'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With