Which one of these is faster? Is one "better"? Basically I'll have two sets and I want to eventually get one match from between the two lists. So really I suppose the for loop is more like:
for object in set:
if object in other_set:
return object
Like I said - I only need one match, but I'm not sure how intersection()
is handled, so I don't know if its any better. Also, if it helps, the other_set
is a list near 100,000 components and the set
is maybe a few hundred, max few thousand.
set is as fast as it gets. However, if you rewrite your code to create the set once, and not change it, you can use the frozenset built-in type. It's exactly the same except immutable. If you're still having speed problems, you need to speed your program up in other ways, such as by using PyPy instead of cPython.
Python Set intersection() Method The intersection() method returns a set that contains the similarity between two or more sets. Meaning: The returned set contains only items that exist in both sets, or in all sets if the comparison is done with more than two sets.
The intersection of sets can be denoted using the symbol '∩'. As defined above, the intersection of two sets A and B is the set of all those elements which are common to both A and B. Symbolically, we can represent the intersection of A and B as A ∩ B.
from timeit import timeit
setup = """
from random import sample, shuffle
a = range(100000)
b = sample(a, 1000)
a.reverse()
"""
forin = setup + """
def forin():
# a = set(a)
for obj in b:
if obj in a:
return obj
"""
setin = setup + """
def setin():
# original method:
# return tuple(set(a) & set(b))[0]
# suggested in comment, doesn't change conclusion:
return next(iter(set(a) & set(b)))
"""
print timeit("forin()", forin, number = 100)
print timeit("setin()", setin, number = 100)
Times:
>>>
0.0929054012768
0.637904308732
>>>
0.160845057616
1.08630760484
>>>
0.322059185123
1.10931801261
>>>
0.0758695262169
1.08920981403
>>>
0.247866360526
1.07724461708
>>>
0.301856152688
1.07903130641
Making them into sets in the setup and running 10000 runs instead of 100 yields
>>>
0.000413064976328
0.152831597075
>>>
0.00402408388788
1.49093627898
>>>
0.00394538156695
1.51841512101
>>>
0.00397715579584
1.52581949403
>>>
0.00421472926155
1.53156769646
So your version is much faster whether or not it makes sense to convert them to sets.
I realize this is a older post. But, I arrived here looking for performance speeds comparing using intersection
vs in
and thought it'd be worth adding more info. The answers above were great, but left me unclear as to the actual best path forward.
The "first result" solution doesn't solve for my use case specifically.
Instead, I wanted to know how the different implementations would perform, producing identical results sets, using discrete approaches. Not just the first single intersected value. As such, below I've included code to perform an evaluation of the options with a 1000 loop test. Contrary to what @agf posted, using sets is far faster when the desired output is a list of matches.
My results were:
runForin took 132851.600ms
runForinBlist took 37700.916ms
True
runForInListComp took 132783.147ms
True
runForinSet took 780.919ms
True
runSetIntersection took 760.980ms (WINNER)
True
runSetin took 771.921ms
True
Here's the code. Hope it helps someone. Note: I also evaluated the blist (http://stutzbachenterprises.com/blist/blist.html) library as it performs quite well in other use cases.
import time
from random import sample, shuffle
from blist import blist
a = range(100000)
aBlist = blist([i for i in a])
b = sample(a, 1000)
a.reverse()
def print_timing(func):
def wrapper(*arg):
t1 = time.time()
res = func(*arg)
t2 = time.time()
print '%s took %0.3fms' % (func.func_name, (t2-t1)*1000.0)
return res
return wrapper
def forIn():
ret = []
for obj in b:
if obj in a:
ret.append(obj)
return ret
def forInBlist():
ret = []
for obj in b:
if obj in aBlist:
ret.append(obj)
return ret
def forInListComp():
return [value for value in b if value in a]
def forInSet():
ret = []
for obj in b:
if obj in set(a):
ret.append(obj)
return ret
def setIntersection():
return set(a).intersection(b)
def setIn():
return list(set(a) & set(b))
@print_timing
def runForIn(times):
for i in range(times):
ret = forIn()
return ret
@print_timing
def runForInBlist(times):
for i in range(times):
ret = forInBlist()
return ret
@print_timing
def runForInListComp(times):
for i in range(times):
ret = forInListComp()
return ret
@print_timing
def runForInSet(times):
for i in range(times):
ret = forInSet()
return ret
@print_timing
def runSetIntersection(times):
for i in range(times):
ret = setIntersection()
return ret
@print_timing
def runSetIn(times):
for i in range(times):
ret = setIn()
return ret
def checkResults(results):
master = None
for resultSet in results:
if not master:
master = sorted(list(resultSet))
continue
try:
if master != sorted(list(resultSet)):
return False, master, sorted(list(resultSet))
except:
print resultSet
return False
return True
iterations = 5
results = []
runForInResults = runForIn(iterations)
results.append(runForInResults)
runForInBlistResults = runForInBlist(iterations)
results.append(runForInBlistResults)
print checkResults(results)
runForInListCompResults = runForInListComp(iterations)
results.append(runForInListCompResults)
print checkResults(results)
runForInSetResults = runForInSet(iterations)
results.append(runForInSetResults)
print checkResults(results)
runSetIntersectionResults = runSetIntersection(iterations)
results.append(runSetIntersectionResults)
print checkResults(results)
runSetInResults = runSetIn(iterations)
results.append(runSetInResults)
print checkResults(results)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With