I get a list and want to know if all elements are identical.
For lists with a high number of elements, that are indeed all identical, the conversion into a set
is fast, but otherwise going over the list with early exit is performing better:
def are_all_identical_iterate(dataset):
first = dataset[0]
for data in dataset:
if data != first:
return False
return True
# versus
def are_all_identical_all(dataset):
return all(data == dataset[0] for data in dataset)
# or
def are_all_identical_all2(dataset):
return all(data == dataset[0] for data in iter(dataset))
# or
def are_all_identical_all3(dataset):
iterator = iter(dataset)
first = next(iterator)
return all(first == rest for rest in iterator)
NUM_ELEMENTS = 50000
testDataset = [1337] * NUM_ELEMENTS # all identical
from timeit import timeit
print(timeit("are_all_identical_iterate(testDataset)", setup="from __main__ import are_all_identical_iterate, testDataset", number=1000))
print(timeit("are_all_identical_all(testDataset)", setup="from __main__ import are_all_identical_all, testDataset", number=1000))
My results: 0.94 seconds, 3.09 seconds, 3.27 seconds, 2.61 seconds
The for loop takes less than have the time (3x) of the all
function. The all
function is supposed to be the same implementation.
What is going on? I want to know why the loop is so much faster and why there is a difference between the last 3 implementations. The last implementation should have one compare less, because the iterator removes the first element, but that shouldn't have this kind of impact.
As suggested in this other SO post one cause could be that:
The use of a generator expression causes overhead for constantly pausing and resuming the generator.
Anyway I suggest two another approaches that looks faster using map
:
def are_all_identical_map(dataset):
for x in map(lambda x: x == dataset[0], dataset):
if not x:
return False
return True
%%timeit
are_all_identical_map(testDataset)
#7.5 ms ± 64.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
and
%%timeit
(map(lambda x: x == dataset[0], dataset)) and True
#303 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With