Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the 'all' implementation slower than writing a for loop [duplicate]

Tags:

python

I get a list and want to know if all elements are identical. For lists with a high number of elements, that are indeed all identical, the conversion into a set is fast, but otherwise going over the list with early exit is performing better:

def are_all_identical_iterate(dataset):
    first = dataset[0]
    for data in dataset:
        if data != first:
            return False
    return True

# versus

def are_all_identical_all(dataset):
    return all(data == dataset[0] for data in dataset)
# or
def are_all_identical_all2(dataset):
    return all(data == dataset[0] for data in iter(dataset))
# or
def are_all_identical_all3(dataset):
    iterator = iter(dataset)
    first = next(iterator)
    return all(first == rest for rest in iterator)

NUM_ELEMENTS = 50000
testDataset = [1337] * NUM_ELEMENTS # all identical

from timeit import timeit
print(timeit("are_all_identical_iterate(testDataset)", setup="from __main__ import are_all_identical_iterate, testDataset", number=1000))
print(timeit("are_all_identical_all(testDataset)", setup="from __main__ import are_all_identical_all, testDataset", number=1000))

My results: 0.94 seconds, 3.09 seconds, 3.27 seconds, 2.61 seconds

The for loop takes less than have the time (3x) of the all function. The all function is supposed to be the same implementation.

What is going on? I want to know why the loop is so much faster and why there is a difference between the last 3 implementations. The last implementation should have one compare less, because the iterator removes the first element, but that shouldn't have this kind of impact.

like image 394
cubei Avatar asked Sep 18 '19 15:09

cubei


1 Answers

As suggested in this other SO post one cause could be that:

The use of a generator expression causes overhead for constantly pausing and resuming the generator.

Anyway I suggest two another approaches that looks faster using map:

def are_all_identical_map(dataset):
    for x in map(lambda x: x == dataset[0], dataset):
        if not x:
            return False
    return True

%%timeit
are_all_identical_map(testDataset)
#7.5 ms ± 64.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and

%%timeit
(map(lambda x: x == dataset[0], dataset)) and True
#303 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
like image 63
FabioL Avatar answered Oct 19 '22 23:10

FabioL