Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic vs Unpythonic

Tags:

python

runtime

Consider these two functions that does the following: Given a word, it produces a list where every position in the word is replaced by every character in the alphabet

alphabet = 'abcdefghijklmnopqrstuvwxyz'

Version 1 (Pythonic):

def replace1_word( word ):
    return [word[:w]+alphabet[i]+word[w+1:] 
            for w in range(len(word)) 
            for i in range(len(alphabet))]

Version 2 (Unpythonic):

def replace1_word2( word ):
    res=[]
    for w in range(len(word)):
        for i in range(len(alphabet)):
            res.append( word[:w] + alphabet[i] + word[w+1:] )
    return res

I used timeit module to run it 1000 times and measure the timings and the average runtime difference comes down to be between 0.028 milliseconds to 0.040 milliseconds.

My questions is which part/line of the code is costly in the second version and why? They both "seem to" work the same way and return the same result in list format.

like image 577
sPaz Avatar asked Dec 26 '22 19:12

sPaz


2 Answers

My questions is which part/line of the code is costly in the second version and why? They both "seem to" work the same way and return the same result in list format.

No they aren't. If in doubt, always profile it, it will give you a picture of the cost for each operation. Just looking at the below O/P does it now tell, what is costly in your second function?

>>> cProfile.run("replace1_word2('foo bar baz')")
         313 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <pyshell#216>:1(replace1_word2)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
       12    0.000    0.000    0.000    0.000 {len}
      286    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       12    0.000    0.000    0.000    0.000 {range}


>>> cProfile.run("replace1_word('foo bar baz')")
         27 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <pyshell#220>:1(replace1_word)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
       12    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       12    0.000    0.000    0.000    0.000 {range}

Care to explain what replaces append (or how does the version 1 generate the list)

In Python, function calls have extra overhead. In the former case, the list.append is called multiple times where as in case of list comprehension, the list gets generated as a single expression. So in a sense, there is no equivalent notation for a list comprehension with loop structures. List comprehension is a powerful tool in Python rather than a decorated syntax for loops.

Epilogue

If you ask me to write a function to solve this problem, I would end up something as

>>> from itertools import product
>>> from string import ascii_lowercase
>>> def replace1_word4(word):
         words = ('{}'.join([word[:w], word[w+1:]]) for w in range(len(word)))
         return [word.format(replace)
                 for  word, replace in product(words, ascii_lowercase)]
like image 87
Abhijit Avatar answered Dec 28 '22 09:12

Abhijit


It probably has to do with how list comprehensions are evaluated vs for loops with list.append to a collector variable; if you change the second snippet to use yield and then wrap its result with list(), the performance gets closer to the first version:

def replace1_word3(word):
    for w in range(len(word)):
        for i in range(len(alphabet)):
            yield word[:w] + alphabet[i] + word[w+1:]

benchmarks:

In [18]: timeit replace1_word('foo bar baz ' * 100)
10 loops, best of 3: 38.7 ms per loop

In [19]: timeit replace1_word2('foo bar baz ' * 100)
10 loops, best of 3: 42.1 ms per loop

In [20]: timeit list(replace1_word3('foo bar baz ' * 100))
10 loops, best of 3: 39.7 ms per loop

The rest of the difference can probably be attributed to how the actualy list is constructed internally in a list comprehension vs the performance of yield => generator => list().

P.S. Abhijit's answer can probably explain in more technical terms why replace1_word is faster. In any case, it looks like that list.append is the culprit as I guessed.

like image 28
Erik Kaplun Avatar answered Dec 28 '22 09:12

Erik Kaplun