I'm relatively new to Python, and was wondering if there is any reason to prefer one of these methods over the other when removing elements from a dict
?
A) Using del
# d is a dict, k is a key
if k in d:
del k
B) Using pop
d.pop(k, None)
My first thought was that approach (A) needs to do two look-ups - once in the if
statement, and again in the implementation of del
, which would make it slightly slower than pop
, which only needs one look-up. A colleague then pointed out that del
may yet have the edge, because it is a keyword, and may therefore potentially be better optimised, whereas pop
is a method that can be replaced by end users (not sure if this is really a factor, but he does have far more experience at writing Python code).
I wrote a few test snippets to compare performance. It looks like del
has the edge (I've appended the snippets if anyone cares to try them out or comment on correctness).
So, this brings me back to the question: Other than a marginal performance gain, is there a reason to prefer one over the other?
Here are the snippets to test performance:
import timeit
print 'in: ', timeit.Timer(stmt='42 in d', setup='d = dict.fromkeys(range(100000))').timeit()
print 'pop: ', timeit.Timer(stmt='d.pop(42,None)', setup='d = dict.fromkeys(range(100000))').timeit()
print 'del: ', timeit.Timer(stmt='if 42 in d:\n del d[42]', setup='d = dict.fromkeys(range(100000))').timeit()
This outputs
in: 0.0521960258484
pop: 0.172810077667
del: 0.0660231113434
So that was a curious result. I would have expected pop
to be roughly on par with in
, but it's more than three times as expensive. Another surprise was that del
was only slightly slower than in
, until I realized that the dictionary from the setup statement in the timeit
class remains the same instance, so only the first call will hit the del
statements, as all others won't pass the if
statement.
So I wrote a longer profiling snippet in an attempt to try to avoid this. I runs several timeit
runs with some randomised key selections and tries to ensure that we'll mostly hit the if
statement and the del
statement (so we're not working with the same dictionary instance all the time):
#! /usr/bin/bash
import timeit
# Number of times to repeat fresh setup before doing timeit runs
repeat_num=100
# Number of timeit runs per setup
number=1000
# Size of dictionary for runs (smaller)
small_size=10000
# Size of dictionary for timeit runs (larger)
large_size=1000000
# Switches garbage collection on if True
collect_garbage = False
setup_stmt = """
import random
d = dict.fromkeys(range(%(dict_size)i))
# key, randomly chosen
k = random.randint(0,%(dict_size)i - 1)
%(garbage)s
"""
in_stmt = """
k in d
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}
pop_stmt = """
d.pop(k, None)
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}
del_stmt = """
if k in d:
del d[k]
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}
# Results for smaller dictionary size
print \
"""SETUP:
repeats : %(repeats)s
runs per repeat: %(number)s
garbage collect: %(garbage)s""" \
% {'repeats' : repeat_num,
'number' : number,
'garbage' : 'yes' if collect_garbage else 'no'}
print "SMALL:"
small_setup_stmt = setup_stmt % \
{'dict_size' : small_size,
'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : small_size},
setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print " in: ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : small_size},
setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print " pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : small_size},
setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print " del: ", sum(times)/len(times)
# Results for larger dictionary size
print "LARGE:"
large_setup_stmt = setup_stmt % \
{'dict_size' : large_size,
'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : large_size},
setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print " in: ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : large_size},
setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print " pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : large_size},
setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print " del: ", sum(times)/len(times)
Doing 100 setups, each with 1000 runs each, prints the following:
SETUP:
repeats : 100
runs per repeat: 1000
garbage collect: no
SMALL:
in: 0.00020430803299
pop: 0.000313355922699
del: 0.000262062549591
LARGE:
in: 0.000201721191406
pop: 0.000328607559204
del: 0.00027587890625
I'm new to using timeit
, so it's possible that this is a flawed test, but it does seem to indicate that del
has a small advantage in terms of performance.
One thing I did learn from this exercise the hard way is that Python dictionaries are hash maps, so the size of the dictionary doesn't affect the look-up time as much as it would a C++ std::map
, for example (constant time vs O(log(n))-ish). Oh well. Live and learn.
pop() returns deleted value. The del keyword can delete the single value from a list or delete the whole list at a time. At a time it deletes only one value from the list.
Array elements can be removed using pop() or remove() method. The difference between these two functions is that the former returns the deleted value whereas the latter does not. The pop() function takes either no parameter or the index value as its parameter.
pop will be faster, especially for the last item in the list; however, if you've started with the item itself and have already had a O(n) operation with a rich comparison to find its .
Remove and Delete are defined quite similarly, but the main difference between them is that delete means erase (i.e. rendered nonexistent or nonrecoverable), while remove denotes take away and set aside (but kept in existence).
I would not worry about the performance differences unless you have specific reason to believe that they are causing meaningful slowdowns in your program, which is unlikely.
The real reason you might choose to use del
vs pop
is because they have different behaviors. pop
returns the value for the popped key, so you would use pop
if you want to do something with that value at the same time as you remove it. If you don't need to do anything with the value, but just want to remove the item, use del
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With