UPDATED based on Lennart Regebro's answer Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient: <pre class="prettyprint"><code>remove = [] for k, v in dict_.items(): if condition(k, v): remove.append(k) continue # do other things you need to do in this loop for k in remove: del dict_[k] </code></pre> The only overhead here is building the list of keys to remove; unless it grows large compared to the dictionary size, it's not an issue. However, this approach requires some extra coding, so it's not very popular. The popular dict comprehension approach: <pre class="prettyprint"><code>dict_ = {k : v for k, v in dict_ if not condition(k, v)} for k, v in dict_.items(): # do other things you need to do in this loop </code></pre> results in a full dictionary copy, and so has the risk of a silly performance hit if dictionaries grow large or the containing function is called often. A much better approach is to copy the keys only rather than whole dictionary: <pre class="prettyprint"><code>for k in list(dict_.keys()): if condition(k, dict_[k]): del dict_[k] continue # do other things you need to do in this loop </code></pre> (Note that all code examples are in Python 3, so <code>keys()</code>, <code>items()</code> returns a view, not a copy.) In most cases, it won't hurt performance that much, since the time to check even the simplest condition (not to mention other stuff you're doing in the loop) is usually greater than the time to add one key to a list. Still, I am wondering if it's possible to avoid even that with a custom dictionary that allows deletions while iterating: <pre class="prettyprint"><code>for k, v in dict_.items(): if condition(k, v): del dict_[k] continue # do other things you need to do in this loop </code></pre> Perhaps an iterator could always look ahead, so that when the <code>__next__</code> is called, the iterator knows where to go without even looking at the current element (it would only need to look at the element when it first gets to it). And if there is no next element, the iterator could just set the flag that would cause <code>StopIteration</code> exception raised whenever <code>__next__</code> is called again. If the element the iterator tries to advance to turns out to be deleted, it's fine to raise an exception; there is no need to support deletions while multiple iterations are going on simultaneously. Are there any problems with this approach? One problem is that I'm not sure it can be done with no material overhead compared to the existing <code>dict</code>; otherwise, it would be faster to use the <code>list(dict_)</code> approach! UPDATE: I tried all the versions. I don't report the timing, since they are clearly very dependent on the exact situation. But it seems safe to say that in many cases, the fastest approach is likely to be <code>list(dict_)</code>. After all, if you think about, the copy is the fastest operation that grows linearly with size of the list; almost any other overhead, as long as it's also proportional to the list size, is likely to be bigger. I really like all the ideas, but since I have to select only one, I'm accepting the context manager solution since it allows to use the dictionary as either normal or "enhanced" with very small code changes.

As you note, you can store the items to delete somewhere and defer the deletion of them until later. The problem then becomes when to purge them and how to make sure that the purge method eventually gets called. The answer to this is a context manager which is also a subclass of <code>dict</code>. <pre class="prettyprint"><code>class dd_dict(dict): # the dd is for "deferred delete" _deletes = None def __delitem__(self, key): if key not in self: raise KeyError(str(key)) dict.__delitem__(self, key) if self._deletes is None else self._deletes.add(key) def __enter__(self): self._deletes = set() def __exit__(self, type, value, tb): for key in self._deletes: try: dict.__delitem__(self, key) except KeyError: pass self._deletes = None </code></pre> Usage: <pre class="prettyprint"><code># make the dict and do whatever to it ddd = dd_dict(a=1, b=2, c=3) # now iterate over it, deferring deletes with ddd: for k, v in ddd.iteritems(): if k is "a": del ddd[k] print ddd # shows that "a" is still there print ddd # shows that "a" has been deleted </code></pre> If you're not in a <code>with</code> block, of course, deletes are immediate; as this is a <code>dict</code> subclass, it works just like a regular <code>dict</code> outside of a context manager. You could also implement this as a wrapper class for a dictionary: <pre class="prettyprint"><code>class deferring_delete(object): def __init__(self, d): self._dict = d def __enter__(self): self._deletes = set() return self def __exit__(self, type, value, tb): for key in self._deletes: try: del self._dict[key] except KeyError: pass del self._deletes def __delitem__(self, key): if key not in self._dict: raise KeyError(str(key)) self._deletes.add(key) d = dict(a=1, b=2, c=3) with deferring_delete(d) as dd: for k, v in d.iteritems(): if k is "a": del dd[k] # delete through wrapper print d </code></pre> It's even possible to make the wrapper class fully functional as a dictionary, if you want, though that's a fair bit more code. Performance-wise, this is admittedly not such a win, but I like it from a programmer-friendliness standpoint. The second method should be very slightly faster since it's not testing a flag on each delete.

custom dict that allows delete during iteration

Tags:

python

iterator

dictionary

python-3.x

UPDATED based on Lennart Regebro's answer

Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient:

remove = [] for k, v in dict_.items():   if condition(k, v):     remove.append(k)     continue   # do other things you need to do in this loop for k in remove:   del dict_[k]

The only overhead here is building the list of keys to remove; unless it grows large compared to the dictionary size, it's not an issue. However, this approach requires some extra coding, so it's not very popular.

The popular dict comprehension approach:

dict_ = {k : v for k, v in dict_ if not condition(k, v)} for k, v in dict_.items():   # do other things you need to do in this loop

results in a full dictionary copy, and so has the risk of a silly performance hit if dictionaries grow large or the containing function is called often.

A much better approach is to copy the keys only rather than whole dictionary:

for k in list(dict_.keys()):   if condition(k, dict_[k]):     del dict_[k]     continue   # do other things you need to do in this loop

(Note that all code examples are in Python 3, so keys(), items() returns a view, not a copy.)

In most cases, it won't hurt performance that much, since the time to check even the simplest condition (not to mention other stuff you're doing in the loop) is usually greater than the time to add one key to a list.

Still, I am wondering if it's possible to avoid even that with a custom dictionary that allows deletions while iterating:

for k, v in dict_.items():   if condition(k, v):     del dict_[k]     continue   # do other things you need to do in this loop

Perhaps an iterator could always look ahead, so that when the __next__ is called, the iterator knows where to go without even looking at the current element (it would only need to look at the element when it first gets to it). And if there is no next element, the iterator could just set the flag that would cause StopIteration exception raised whenever __next__ is called again.

If the element the iterator tries to advance to turns out to be deleted, it's fine to raise an exception; there is no need to support deletions while multiple iterations are going on simultaneously.

Are there any problems with this approach?

One problem is that I'm not sure it can be done with no material overhead compared to the existing dict; otherwise, it would be faster to use the list(dict_) approach!

UPDATE:

I tried all the versions. I don't report the timing, since they are clearly very dependent on the exact situation. But it seems safe to say that in many cases, the fastest approach is likely to be list(dict_). After all, if you think about, the copy is the fastest operation that grows linearly with size of the list; almost any other overhead, as long as it's also proportional to the list size, is likely to be bigger.

I really like all the ideas, but since I have to select only one, I'm accepting the context manager solution since it allows to use the dictionary as either normal or "enhanced" with very small code changes.

503

asked Jan 26 '12 18:01

max

1 Answers

As you note, you can store the items to delete somewhere and defer the deletion of them until later. The problem then becomes when to purge them and how to make sure that the purge method eventually gets called. The answer to this is a context manager which is also a subclass of dict.

class dd_dict(dict):    # the dd is for "deferred delete"     _deletes = None     def __delitem__(self, key):         if key not in self:             raise KeyError(str(key))         dict.__delitem__(self, key) if self._deletes is None else self._deletes.add(key)     def __enter__(self):         self._deletes = set()     def __exit__(self, type, value, tb):         for key in self._deletes:             try:                 dict.__delitem__(self, key)             except KeyError:                 pass         self._deletes = None

Usage:

# make the dict and do whatever to it ddd = dd_dict(a=1, b=2, c=3)  # now iterate over it, deferring deletes with ddd:     for k, v in ddd.iteritems():         if k is "a":             del ddd[k]             print ddd     # shows that "a" is still there  print ddd                 # shows that "a" has been deleted

If you're not in a with block, of course, deletes are immediate; as this is a dict subclass, it works just like a regular dict outside of a context manager.

You could also implement this as a wrapper class for a dictionary:

class deferring_delete(object):     def __init__(self, d):         self._dict = d     def __enter__(self):         self._deletes = set()         return self     def __exit__(self, type, value, tb):         for key in self._deletes:             try:                 del self._dict[key]             except KeyError:                 pass         del self._deletes     def __delitem__(self, key):         if key not in self._dict:             raise KeyError(str(key))         self._deletes.add(key)  d = dict(a=1, b=2, c=3)  with deferring_delete(d) as dd:     for k, v in d.iteritems():         if k is "a":             del dd[k]    # delete through wrapper  print d

It's even possible to make the wrapper class fully functional as a dictionary, if you want, though that's a fair bit more code.

Performance-wise, this is admittedly not such a win, but I like it from a programmer-friendliness standpoint. The second method should be very slightly faster since it's not testing a flag on each delete.

answered Oct 03 '22 07:10

kindall

Related questions
                            
                                What is python's equivalent of R's NA?
                            
                                What are the differences between mysql-connector-python, mysql-connector-python-rf and mysql-connector-repackaged?
                            
                                Why is a False value (0) smaller in bytes than True (1)?
                            
                                What is the difference between jedi and python language server in VS code IDE?
                            
                                How to override the default value of a Model Field from an Abstract Base Class
                            
                                How do I wrap a C++ class with Cython?
                            
                                Regular Expressions in Python unexpectedly slow
                            
                                Python CLI program unit testing
                            
                                How to get PyCharm to check PEP8 code style?
                            
                                python asyncio, how to create and cancel tasks from another thread
                            
                                Keras Maxpooling2d layer gives ValueError
                            
                                How to install Tensorflow on Python 2.7 on Windows?
                            
                                When to use utf8 as a header in py files
                            
                                What does the landing time mean in airflow?
                            
                                CS231n: How to calculate gradient for Softmax loss function?
                            
                                How can I test a .tflite model to prove that it behaves as the original model using the same Test Data?
                            
                                Max Number of unique substrings from a partition
                            
                                Python recursive function error: "maximum recursion depth exceeded" [duplicate]
                            
                                In Python, is use of `del` statement a code smell?
                            
                                Equivalent to InnerHTML when using lxml.html to parse HTML

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With