Is there any "faster way" to remove key, value pairs from Counter where value is less than certain value?
I've done the following:
counter_dict = {k:v for k, v in counter_dict.items() if v > 5}
The major issue with the current code is the call to .items
, which will create a list of all items:
One optimization could be to use Counter.iteritems
instead of .items
, to save the penalty of creating a list and iterating through it again.
>>> from collections import Counter
>>> cnt = Counter("asbdasdbasdbadaasasdasadsa")
>>> {k:v for k,v in cnt.iteritems() if v > 5}
{'a': 10, 's': 7, 'd': 6}
Another optimization could be to not call the .items
method, and instead iterate on the keys and access the values using the key:
>>> from collections import Counter
>>> cnt = Counter("asbdasdbasdbadaasasdasadsa")
>>> {k:cnt[k] for k in cnt if cnt[k] > 5}
{'a': 10, 's': 7, 'd': 6}
If we try to measure the difference with %timeit
in ipython, using a sample Counter with your mentioned if condition, iteritems
wins hands down:
In [1]: import random
In [2]: from collections import Counter
In [3]: MILLION = 10**6
In [4]: cnt = Counter(random.randint(0, MILLION) for _ in xrange(MILLION))
In [5]: %timeit {k:v for k, v in cnt.iteritems() if v < 5}
10 loops, best of 3: 140 ms per loop
In [6]: %timeit {k:v for k, v in cnt.items() if v**2 < 5}
1 loops, best of 3: 290 ms per loop
In [7]: %timeit {k:cnt[k] for k in cnt if cnt[k] < 5}
1 loops, best of 3: 272 ms per loop
With change of conditions:
In [8]: %timeit {k:v for k, v in cnt.iteritems() if v > 5}
10 loops, best of 3: 87 ms per loop
In [9]: %timeit {k:v for k, v in cnt.items() if v > 5}
1 loops, best of 3: 186 ms per loop
In [10]: %timeit {k:cnt[k] for k in cnt if cnt[k] > 5}
10 loops, best of 3: 153 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With