Why is the 'all' implementation slower than writing a for loop [duplicate]

Question

I get a list and want to know if all elements are identical. For lists with a high number of elements, that are indeed all identical, the conversion into a set is fast, but otherwise going over the list with early exit is performing better:

def are_all_identical_iterate(dataset):
    first = dataset[0]
    for data in dataset:
        if data != first:
            return False
    return True

# versus

def are_all_identical_all(dataset):
    return all(data == dataset[0] for data in dataset)
# or
def are_all_identical_all2(dataset):
    return all(data == dataset[0] for data in iter(dataset))
# or
def are_all_identical_all3(dataset):
    iterator = iter(dataset)
    first = next(iterator)
    return all(first == rest for rest in iterator)

NUM_ELEMENTS = 50000
testDataset = [1337] * NUM_ELEMENTS # all identical

from timeit import timeit
print(timeit("are_all_identical_iterate(testDataset)", setup="from __main__ import are_all_identical_iterate, testDataset", number=1000))
print(timeit("are_all_identical_all(testDataset)", setup="from __main__ import are_all_identical_all, testDataset", number=1000))

My results: 0.94 seconds, 3.09 seconds, 3.27 seconds, 2.61 seconds

The for loop takes less than have the time (3x) of the all function. The all function is supposed to be the same implementation.

What is going on? I want to know why the loop is so much faster and why there is a difference between the last 3 implementations. The last implementation should have one compare less, because the iterator removes the first element, but that shouldn't have this kind of impact.

FabioL · Accepted Answer

As suggested in this other SO post one cause could be that:

The use of a generator expression causes overhead for constantly pausing and resuming the generator.

Anyway I suggest two another approaches that looks faster using map:

def are_all_identical_map(dataset):
    for x in map(lambda x: x == dataset[0], dataset):
        if not x:
            return False
    return True

%%timeit
are_all_identical_map(testDataset)
#7.5 ms ± 64.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

and

%%timeit
(map(lambda x: x == dataset[0], dataset)) and True
#303 ns ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Why is the 'all' implementation slower than writing a for loop [duplicate]

Tags:

python

cubei

1 Answers

FabioL

Recent Activity

Donate For Us

Why is the 'all' implementation slower than writing a for loop [duplicate]

Tags:

python

cubei

1 Answers

FabioL

Related questions

Recent Activity

Donate For Us