Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange performance results -- loop vs list comprehension and zip()

I had a really simple problem and got some strange results when I tried to figure out which solution is faster.

The original Problem: Given two lists ListA, ListB and a constant k, remove all entries where the two lists sum to k.

I solved the problem in two ways: First I tried using a loop and then I used list comprehension and zip() to zip and unzip the two lists.

The version using a loop.

def Remove_entries_simple(listA, listB, k):
    """ removes entries that sum to k """
    new_listA = []
    new_listB = []
    for index in range(len(listA)):
        if listA[index] + listB[index] == k:
            pass
        else:
            new_listA.append(listA[index])
            new_listB.append(listB[index])
    return(new_listA, new_listB)

The version using list comprehension and zip()

def Remove_entries_zip(listA, listB, k):
    """ removes entries that sum to k using zip"""
    zip_lists = [(a, b) for (a, b) in zip(listA, listB) if not (a+b) == k]

    # unzip the lists
    new_listA, new_listB = zip(*zip_lists)
    return(list(new_listA), list(new_listB))

Then I tried to determine which approach is faster. But then I got what you see in figure below (x-axis: size of the lists, y-axis: average time to run it, 10**3 repetitions). For some reason the version using zip() always makes the same jump at somewhat the same position -- I ran it multiple times and on different machines. Can someone explain what could cause such an odd behavior?

Plot comparing running times

Update: The code I used to generate the plot. I used a function decorator to run each problem a 1000 times.

import statements:

import random
import time
import matplotlib.pyplot as plt

The function decorator:

def Repetition_Decorator(fun, Rep=10**2):
    ''' returns the average over Rep repetitions'''
    def Return_function(*args, **kwargs):
        Start_time = time.clock()
        for _ in range(Rep):
            fun(*args, **kwargs)
        return (time.clock() - Start_time)/Rep

return Return_function

The code to create the plots:

Zippedizip = []
Loops = []
The_Number = 10
Size_list = list(range(10, 1000, 10))

Repeated_remove_loop = Repetition_Decorator(Remove_entries_simple, Rep=10**3)
Repeated_remove_zip = Repetition_Decorator(Remove_entries_zip, Rep=10**3)

for size in Size_list:
    ListA = [random.choice(range(10)) for _ in range(size)]
    ListB = [random.choice(range(10)) for _ in range(size)]

    Loops.append(Repeated_remove_loop(ListA, ListB, The_Number))
    Zippedizip.append(Repeated_remove_zip(ListA, ListB, The_Number))

plt.xlabel('Size of List')
plt.ylabel('Averaged time in seconds')
plt.plot(Size_list, Loops, label="Using Loop")
plt.plot(Size_list, Zippedizip, label="Zip")
plt.legend(loc='upper left', shadow=False, fontsize='x-large')
plt.show()

Update-Update: thanks to kaya3 for pointing out the timeit module.

To be as close as possible to my original code but also use the timeit module, I created a new function decorator that uses the timeit module for timing the code.

The new decorator:

def Repetition_Decorator_timeit(fun, Rep=10**2):                                                                                   
"""returns average over Rep repetitions with timeit"""                                                                         
    def Return_function(*args, **kwargs):                                                                                          
        partial_fun = lambda: fun(*args, **kwargs)                                                                                 
        return timeit.timeit(partial_fun, number=Rep) / Rep                                                                        
return Return_function 

When I use the new decorator the version using the for loop is not affected but the zip version no longer makes the jump.

enter image description here

So far I feel quite certain that the jump is a result of how I measure the function rather than the function itself. But the jump is so distinct -- always at the same list size across different machines -- that it cannot be a fluke. Any ideas why exactly this jump happens?

Update-Update-Update:

It has something to do with the garbage collector, because if I disable the garbage collector with gc.disable(), both ways of measuring give the same result.

Garbage Collector disabled

What did I learn here: Do not just measure the time of execution yourself. Use the timeit module for measuring performance of code snippets.

like image 994
oldmansaur Avatar asked Nov 14 '19 23:11

oldmansaur


People also ask

Which is faster for loop or list comprehension?

List comprehensions are faster than for loops to create lists. But, this is because we are creating a list by appending new elements to it at each iteration.

What is the difference between list comprehension and for loop?

The for loop is a common way to iterate through a list. List comprehension, on the other hand, is a more efficient way to iterate through a list because it requires fewer lines of code. List comprehension requires less computation power than a for loop because it takes up less space and code.

How much faster are list comprehensions?

The list comprehension is 50% faster.

Are list comprehensions more efficient?

Conclusions. List comprehensions are often not only more readable but also faster than using "for loops." They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.


1 Answers

This seems to be an artifact of the way you have measured the running times. I do not know what causes your timing code to produce this effect, but the effect disappears when I use timeit to measure the running times instead. I'm using Python 3.6.2.

I can consistently reproduce the effect using your timing code; I get the zip version's running time jumping at around the same threshold, though it is still slightly faster than the other version on my machine:

Timed using clock

However, when I measure the times using timeit, the effect disappears completely:

enter image description here

Here's the code using timeit; I changed as little as possible from your analysis code.

import timeit

Zippedizip = []
Loops = []
The_Number = 10
Size_list = list(range(10, 1000, 10))
Reps = 1000

for size in Size_list:
    ListA = [random.choice(range(10)) for _ in range(size)]
    ListB = [random.choice(range(10)) for _ in range(size)]

    remove_loop = lambda: Remove_entries_simple(ListA, ListB, The_Number)
    remove_zip = lambda: Remove_entries_zip(ListA, ListB, The_Number)

    Loops.append(timeit.timeit(remove_loop, number=Reps) / Reps)
    Zippedizip.append(timeit.timeit(remove_zip, number=Reps) / Reps)

# ...

So I think it's a spurious result. That said, I don't understand what's causing it in your timing code. I tried simplifying your timing code to not use a decorator or vargs, and I replaced time.clock() with time.perf_counter() which is more accurate, but that didn't change anything.

like image 126
kaya3 Avatar answered Oct 24 '22 09:10

kaya3