I am trying to write a code which should return true if any element of list is present in a dictionary. Performance of this piece is really important. I know I can just loop over list and break if I find the first search hit. Is there any faster or more Pythonic way for this than given below?
for x in someList:
if x in someDict:
return True
return False
EDIT: I am using Python 2.7
. My first preference would be a faster method.
Use of builtin any can have some performance edge over two loops
any(x in someDict for x in someList)
but you might need to measure your mileage. If your list and dict remains pretty static and you have to perform the comparison multiple times, you may consider using set
someSet = set(someList)
someDict.viewkeys() & someSet
Note Python 3.X, by default returns views rather than a sequence, so it would be straight forward when using Python 3.X
someSet = set(someList)
someDict.keys() & someSet
In both the above cases you can wrap the result with a bool to get a boolean result
bool(someDict.keys() & set(someSet ))
Heretic Note
My curiosity got the better of me and I timed all the proposed solutions. It seems that your original solution is better performance wise. Here is the result
Sample Randomly generated Input
def test_data_gen():
from random import sample
for i in range(1,5):
n = 10**i
population = set(range(1,100000))
some_list = sample(list(population),n)
population.difference_update(some_list)
some_dict = dict(zip(sample(population,n),
sample(range(1,100000),n)))
yield "Population Size of {}".format(n), (some_list, some_dict), {}
I rewrote the Test Part of the answer as it was messy and the answer was receiving quite a decent attention. I created a timeit compare python module and moved it onto github
Timeit repeated for 10 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.000011 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_any |0.000014 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_not_not |0.000015 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_imap_any |0.000018 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 5|foo_any |0.000019 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_ifilter_next |0.000022 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.000024 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.000047 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_any |0.000071 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 2|foo_nested |0.000072 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_not_not |0.000073 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_next |0.000076 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_imap_any |0.000082 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.000092 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.000170 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.000638 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_not_not |0.000746 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_any |0.000746 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_next |0.000752 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_nested |0.000771 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 5|foo_set_ashwin |0.000838 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_imap_any |0.000842 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 7|foo_any |0.000933 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.001702 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 10000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.007195 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_next |0.007410 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_any |0.007491 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_not_not |0.007671 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_set_ashwin |0.008385 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_imap_any |0.011327 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 7|foo_any |0.011533 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.018313 |some_dict.viewkeys() & set(some_list )
Timeit repeated for 100 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.000098 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_any |0.000124 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_not_not |0.000131 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_imap_any |0.000142 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 5|foo_ifilter_next |0.000151 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.000158 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.000186 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.000496 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_any |0.000661 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_not_not |0.000677 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_nested |0.000683 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_next |0.000684 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_imap_any |0.000762 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.000854 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.001291 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.005018 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_any |0.007585 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 2|foo_nested |0.007713 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 3|foo_set_ashwin |0.008256 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_not_not |0.008526 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_any |0.009422 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_ifilter_next |0.010259 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 7|foo_imap_any |0.011414 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.019862 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 10000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_imap_any |0.082221 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_any |0.083573 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 3|foo_nested |0.095736 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 4|foo_set_ashwin |0.103427 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 5|foo_ifilter_next |0.104589 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 6|foo_ifilter_not_not |0.117974 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 7|foo_any |0.127739 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.208228 |some_dict.viewkeys() & set(some_list )
Timeit repeated for 1000 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.000953 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_any |0.001134 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_not_not |0.001213 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_next |0.001340 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_imap_any |0.001407 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.001535 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.002252 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.004701 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_any |0.006209 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_next |0.006411 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_not_not |0.006657 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_nested |0.006727 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 5|foo_imap_any |0.007562 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.008262 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.012260 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.046773 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_not_not |0.071888 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_next |0.072150 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_nested |0.073382 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_any |0.075698 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 5|foo_set_ashwin |0.077367 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_imap_any |0.090623 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 7|foo_any |0.093301 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.177051 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 10000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.701317 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_next |0.706156 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_any |0.723368 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_not_not |0.746650 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_set_ashwin |0.776704 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_imap_any |0.832117 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 7|foo_any |0.881777 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |1.665962 |some_dict.viewkeys() & set(some_list )
Timeit repeated for 10000 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
======================================
Test Run for Population Size of 10
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.010581 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_any |0.013512 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 3|foo_imap_any |0.015321 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 4|foo_ifilter_not_not |0.017680 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 5|foo_ifilter_next |0.019334 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.026274 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.030881 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.053605 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 100
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_nested |0.070194 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_not_not |0.078524 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_any |0.079499 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 4|foo_imap_any |0.087349 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 5|foo_ifilter_next |0.093970 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 6|foo_any |0.097948 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 7|foo_set_ashwin |0.130725 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |0.480841 |some_dict.viewkeys() & set(some_list )
======================================
Test Run for Population Size of 1000
======================================
|Rank |FunctionName |Result |Description
+------+---------------------+----------+-----------------------------------------------
| 1|foo_ifilter_any |0.754491 |any(ifilter(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 2|foo_ifilter_not_not |0.756253 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 3|foo_ifilter_next |0.771382 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+----------+-----------------------------------------------
| 4|foo_nested |0.787152 |Original OPs Code
+------+---------------------+----------+-----------------------------------------------
| 5|foo_set_ashwin |0.818520 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+----------+-----------------------------------------------
| 6|foo_imap_any |0.902947 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+----------+-----------------------------------------------
| 7|foo_any |1.001810 |any(x in some_dict for x in some_list)
+------+---------------------+----------+-----------------------------------------------
| 8|foo_set |2.012781 |some_dict.viewkeys() & set(some_list )
=======================================
Test Run for Population Size of 10000
=======================================
|Rank |FunctionName |Result |Description
+------+---------------------+-----------+-----------------------------------------------
| 1|foo_imap_any |10.071469 |any(imap(some_dict.__contains__, some_list))
+------+---------------------+-----------+-----------------------------------------------
| 2|foo_any |11.127034 |any(x in some_dict for x in some_list)
+------+---------------------+-----------+-----------------------------------------------
| 3|foo_set |18.881414 |some_dict.viewkeys() & set(some_list )
+------+---------------------+-----------+-----------------------------------------------
| 4|foo_nested |8.731133 |Original OPs Code
+------+---------------------+-----------+-----------------------------------------------
| 5|foo_ifilter_not_not |9.019190 |not not next(ifilter(some_dict.__contains__...
+------+---------------------+-----------+-----------------------------------------------
| 6|foo_ifilter_next |9.189966 |bool(next(ifilter(some_dict.__contains__...
+------+---------------------+-----------+-----------------------------------------------
| 7|foo_set_ashwin |9.363886 |not set(some_dct).isdisjoint(some_lst)
+------+---------------------+-----------+-----------------------------------------------
| 8|foo_ifilter_any |9.442759 |any(ifilter(some_dict.__contains__, some_list))
And a Graphical Comparison from the above referred module
Premature optimization is evil. It is evident that none of the solutions have optimal performance when the test domain varies. Depending on population size and frequency of iteration, performance of solutions varies considerably. The result again speaks out about the fact that in Python, one should ensure that the code should be readable rather than ensuring that the code is either nifty or optimized for performance for certain cases, but then it may not be scalable.
Note There were some doubts on why not using ifilter performs better than the rest
"In Abhit's answer, he timed the different approaches and found that ifilter/next was not the fastest; any idea why this would be the case? "
It is a known fact that in python, there is an overhead when calling C functions, and if the population size is low but the frequency of iteration is high, the accumulation of C function call overhead would slowly show up. As can be seen in the graphs, where population size is low but iteration is high, using ifilter, performance deviates considerably.
The cleanest and fastest way is to use any() and itertools.ifilter():
any(ifilter(someDict.__contains__, someList))
This code uses:
someDict.__contains__
as the predicateYou could also use itertools.imap() instead of ifilter().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With