When reading the book 'Effective Python' by Brett Slatkin I noticed that the author suggested that sometimes building a list using a generator function and calling list on the resulting iterator could lead to cleaner, more readable code.
So an example:
num_list = range(100)
def num_squared_iterator(nums):
for i in nums:
yield i**2
def get_num_squared_list(nums):
l = []
for i in nums:
l.append(i**2)
return l
Where a user could call
l = list(num_squared_iterator(num_list))
or
l = get_num_squared_list(nums)
and get the same result.
The suggestion was that the generator function has less noise because it is shorter and does not have the extra code for creating the list and appending values to it.
(NOTE clearly for these simple examples a list comprehension or generator expression would be better, but let us take it as given that this is a simplification of a pattern that can be used for more complex code that would not be clear in a list comprehension)
My question is this, is there a cost to wrapping the generator in a list? Would it be equivalent in performance to the list building function?
Seeing this I decided to do a quick test and wrote and ran the following code:
from functools import wraps
from time import time
TEST_DATA = range(100)
def timeit(func):
@wraps(func)
def wrapped(*args, **kwargs):
start = time()
func(*args, **kwargs)
end = time()
print(f'running time for {func.__name__} = {end-start}')
return wrapped
def num_squared_iterator(nums):
for i in nums:
yield i**2
@timeit
def get_num_squared_list(nums):
l = []
for i in nums:
l.append(i**2)
return l
@timeit
def get_num_squared_list_from_iterator(nums):
return list(num_squared_iterator(nums))
if __name__ == '__main__':
get_num_squared_list(TEST_DATA)
get_num_squared_list_from_iterator(TEST_DATA)
I ran the test code many times and each times (much to my surprise) the get_num_squared_list_from_iterator function actually ran (fractionally) faster than the get_num_squared_list function.
Here are results for my first few runs:
1. running time for get_num_squared_list = 5.2928924560546875e-05
running time for get_num_squared_list_from_iterator = 5.0067901611328125e-05
2. running time for get_num_squared_list = 5.3882598876953125e-05
running time for get_num_squared_list_from_iterator = 4.982948303222656e-05
3. running time for get_num_squared_list = 5.1975250244140625e-05
running time for get_num_squared_list_from_iterator = 4.76837158203125e-05
I am guessing that this is due to the expense of doing a list.append in each iteration of the loop in the get_num_squared_list function.
I find this interesting because not only is the code clear and elegant it seems more performant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With