Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is del or pop preferred when removing elements from dicts

I'm relatively new to Python, and was wondering if there is any reason to prefer one of these methods over the other when removing elements from a dict?

A) Using del

# d is a dict, k is a key
if k in d:
   del k

B) Using pop

d.pop(k, None)

My first thought was that approach (A) needs to do two look-ups - once in the if statement, and again in the implementation of del, which would make it slightly slower than pop, which only needs one look-up. A colleague then pointed out that del may yet have the edge, because it is a keyword, and may therefore potentially be better optimised, whereas pop is a method that can be replaced by end users (not sure if this is really a factor, but he does have far more experience at writing Python code).

I wrote a few test snippets to compare performance. It looks like del has the edge (I've appended the snippets if anyone cares to try them out or comment on correctness).

So, this brings me back to the question: Other than a marginal performance gain, is there a reason to prefer one over the other?

Here are the snippets to test performance:

Naive test

import timeit
print 'in:   ', timeit.Timer(stmt='42 in d', setup='d = dict.fromkeys(range(100000))').timeit()
print 'pop:  ', timeit.Timer(stmt='d.pop(42,None)', setup='d = dict.fromkeys(range(100000))').timeit()
print 'del:  ', timeit.Timer(stmt='if 42 in d:\n    del d[42]', setup='d = dict.fromkeys(range(100000))').timeit()

This outputs

in:    0.0521960258484
pop:   0.172810077667
del:   0.0660231113434

So that was a curious result. I would have expected pop to be roughly on par with in, but it's more than three times as expensive. Another surprise was that del was only slightly slower than in, until I realized that the dictionary from the setup statement in the timeit class remains the same instance, so only the first call will hit the del statements, as all others won't pass the if statement.

Slightly less naive test

So I wrote a longer profiling snippet in an attempt to try to avoid this. I runs several timeit runs with some randomised key selections and tries to ensure that we'll mostly hit the if statement and the del statement (so we're not working with the same dictionary instance all the time):

#! /usr/bin/bash

import timeit

# Number of times to repeat fresh setup before doing timeit runs
repeat_num=100
# Number of timeit runs per setup
number=1000
# Size of dictionary for runs (smaller)
small_size=10000
# Size of dictionary for timeit runs (larger)
large_size=1000000
# Switches garbage collection on if True
collect_garbage = False

setup_stmt = """
import random
d = dict.fromkeys(range(%(dict_size)i))
# key, randomly chosen
k = random.randint(0,%(dict_size)i - 1)
%(garbage)s
"""

in_stmt = """
k in d
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}

pop_stmt = """
d.pop(k, None)
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}


del_stmt = """
if k in d:
    del d[k]
%(incr_k)s
""" % {'incr_k' : 'k = (k + 1) %% %(dict_size)i' if number > 1 else ''}

# Results for smaller dictionary size
print \
"""SETUP:
   repeats        : %(repeats)s
   runs per repeat: %(number)s
   garbage collect: %(garbage)s""" \
       % {'repeats' : repeat_num,
          'number'  : number,
          'garbage' : 'yes' if collect_garbage else 'no'}
print "SMALL:"
small_setup_stmt = setup_stmt % \
    {'dict_size' : small_size,
     'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    in:  ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt % {'dict_size' : small_size},
    setup=small_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    del: ", sum(times)/len(times)

# Results for larger dictionary size
print "LARGE:"
large_setup_stmt = setup_stmt % \
    {'dict_size' : large_size,
     'garbage' : 'gc.enable()' if collect_garbage else ''}
times = timeit.Timer(stmt=in_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    in:  ", sum(times)/len(times)
times = timeit.Timer(stmt=pop_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    pop: ", sum(times)/len(times)
times = timeit.Timer(stmt=del_stmt  % {'dict_size' : large_size},
    setup=large_setup_stmt).repeat(repeat=repeat_num,number=number)
print "    del: ", sum(times)/len(times)

Doing 100 setups, each with 1000 runs each, prints the following:

SETUP:
   repeats        : 100
   runs per repeat: 1000
   garbage collect: no
SMALL:
    in:   0.00020430803299
    pop:  0.000313355922699
    del:  0.000262062549591
LARGE:
    in:   0.000201721191406
    pop:  0.000328607559204
    del:  0.00027587890625

I'm new to using timeit, so it's possible that this is a flawed test, but it does seem to indicate that del has a small advantage in terms of performance.

One thing I did learn from this exercise the hard way is that Python dictionaries are hash maps, so the size of the dictionary doesn't affect the look-up time as much as it would a C++ std::map, for example (constant time vs O(log(n))-ish). Oh well. Live and learn.

like image 816
Rob Avatar asked Jul 03 '14 07:07

Rob


People also ask

What is the advantage of pop () method over a Del keyword?

pop() returns deleted value. The del keyword can delete the single value from a list or delete the whole list at a time. At a time it deletes only one value from the list.

What is difference between pop () and remove ()?

Array elements can be removed using pop() or remove() method. The difference between these two functions is that the former returns the deleted value whereas the latter does not. The pop() function takes either no parameter or the index value as its parameter.

Is pop or remove Faster Python?

pop will be faster, especially for the last item in the list; however, if you've started with the item itself and have already had a O(n) operation with a rich comparison to find its .

What is the difference between remove to Del?

Remove and Delete are defined quite similarly, but the main difference between them is that delete means erase (i.e. rendered nonexistent or nonrecoverable), while remove denotes take away and set aside (but kept in existence).


1 Answers

I would not worry about the performance differences unless you have specific reason to believe that they are causing meaningful slowdowns in your program, which is unlikely.

The real reason you might choose to use del vs pop is because they have different behaviors. pop returns the value for the popped key, so you would use pop if you want to do something with that value at the same time as you remove it. If you don't need to do anything with the value, but just want to remove the item, use del.

like image 109
BrenBarn Avatar answered Oct 15 '22 20:10

BrenBarn