Is del or pop preferred when removing elements from dicts

Tags:

python-2.7

I'm relatively new to Python, and was wondering if there is any reason to prefer one of these methods over the other when removing elements from a dict?

A) Using del

# d is a dict, k is a key
if k in d:
   del k

B) Using pop

d.pop(k, None)

My first thought was that approach (A) needs to do two look-ups - once in the if statement, and again in the implementation of del, which would make it slightly slower than pop, which only needs one look-up. A colleague then pointed out that del may yet have the edge, because it is a keyword, and may therefore potentially be better optimised, whereas pop is a method that can be replaced by end users (not sure if this is really a factor, but he does have far more experience at writing Python code).

I wrote a few test snippets to compare performance. It looks like del has the edge (I've appended the snippets if anyone cares to try them out or comment on correctness).

So, this brings me back to the question: Other than a marginal performance gain, is there a reason to prefer one over the other?

Here are the snippets to test performance:

Naive test

import timeit
print 'in:   ', timeit.Timer(stmt='42 in d', setup='d = dict.fromkeys(range(100000))').timeit()
print 'pop:  ', timeit.Timer(stmt='d.pop(42,None)', setup='d = dict.fromkeys(range(100000))').timeit()
print 'del:  ', timeit.Timer(stmt='if 42 in d:\n    del d[42]', setup='d = dict.fromkeys(range(100000))').timeit()

This outputs

in:    0.0521960258484
pop:   0.172810077667
del:   0.0660231113434

So that was a curious result. I would have expected pop to be roughly on par with in, but it's more than three times as expensive. Another surprise was that del was only slightly slower than in, until I realized that the dictionary from the setup statement in the timeit class remains the same instance, so only the first call will hit the del statements, as all others won't pass the if statement.

Slightly less naive test

So I wrote a longer profiling snippet in an attempt to try to avoid this. I runs several timeit runs with some randomised key selections and tries to ensure that we'll mostly hit the if statement and the del statement (so we're not working with the same dictionary instance all the time):

#! /usr/bin/bash

import timeit

# Number of times to repeat fresh setup before doing timeit runs
repeat_num=100
# Number of timeit runs per setup
number=1000
# Size of dictionary for runs (smaller)
small_size=10000
# Size of dictionary for timeit runs (larger)
large_size=1000000
# Switches garbage collection on if True
collect_garbage = False

setup_stmt = """
import random
d = dict.fromkeys(range(%(dict_size)i))
# key, randomly chosen
k = random.randint(0,%(dict_size)i - 1)
%(garbage)s
"""

in_stmt = """
k in d
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}

pop_stmt = """
d.pop(k, None)
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}


del_stmt = """
if k in d:
    del d[k]
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}

# Results for smaller dictionary size
print \
"""SETUP:
   repeats        : %(repeats)s
   runs per repeat: %(number)s
   garbage collect: %(garbage)s""" \
       % {'repeats' : repeat_num,
          'number'  : number,
          'garbage' : 'yes' if collect_garbage else 'no'}
print "SMALL:"
small_setup_stmt = setup_stmt % \
    {'dict_size' : small_size,
     'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    in:  ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    del: ", sum(times)/len(times)

# Results for larger dictionary size
print "LARGE:"
large_setup_stmt = setup_stmt % \
    {'dict_size' : large_size,
     'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    in:  ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    del: ", sum(times)/len(times)

Doing 100 setups, each with 1000 runs each, prints the following:

SETUP:
   repeats        : 100
   runs per repeat: 1000
   garbage collect: no
SMALL:
    in:   0.00020430803299
    pop:  0.000313355922699
    del:  0.000262062549591
LARGE:
    in:   0.000201721191406
    pop:  0.000328607559204
    del:  0.00027587890625

I'm new to using timeit, so it's possible that this is a flawed test, but it does seem to indicate that del has a small advantage in terms of performance.

One thing I did learn from this exercise the hard way is that Python dictionaries are hash maps, so the size of the dictionary doesn't affect the look-up time as much as it would a C++ std::map, for example (constant time vs O(log(n))-ish). Oh well. Live and learn.

816

asked Jul 03 '14 07:07

Rob

1 Answers

I would not worry about the performance differences unless you have specific reason to believe that they are causing meaningful slowdowns in your program, which is unlikely.

The real reason you might choose to use del vs pop is because they have different behaviors. pop returns the value for the popped key, so you would use pop if you want to do something with that value at the same time as you remove it. If you don't need to do anything with the value, but just want to remove the item, use del.

109

answered Oct 15 '22 20:10

BrenBarn

Related questions
                            
                                How do I create a message-id for email in python?
                            
                                Embed Plotly graph into a webpage with Bottle
                            
                                Parse text to get the proper nouns (names and organizations) - python nltk
                            
                                Where does python argument unpacking fall into the order of operations?
                            
                                Bundling Data files with PyInstaller 2.1 and MEIPASS error --onefile
                            
                                How does a sqlalchemy object get detached?
                            
                                Why is my WebSocket automatically closing?
                            
                                Simple customization of matplotlib/pandas bar chart (labels, ticks, etc.)
                            
                                Embedding a matplotlib animation into a tkinter frame
                            
                                Kivy ObjectProperty to update label text
                            
                                Python Gtk TextView insert text at end
                            
                                How can I exit Fullscreen mode in Pygame?
                            
                                Is there an idiomatic approach in Django for writing unobtrusive JavaScript and/or making AJAX form submissions?
                            
                                Python draw flowchart, illustration graphs [closed]
                            
                                Plotting error bars on grouped bars in pandas
                            
                                Pip install from a specific commit prompts "requirements already satisfied"
                            
                                Can SO_REUSEPORT be used on Unix domain sockets?
                            
                                Block an IP address from accessing my Flask app on Heroku?
                            
                                How to use docker-py (official docker client) to start a bash shell?
                            
                                Custom Authenication(User Model) for Cloud Endpoints-Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With