What is happening? Can somebody explain me what happens here, I changed in tight loop:
## j=i
## while j < ls - 1 and len(wordlist[j]) > lc: j+=1
j = next(j for j in range(i,ls) if len(wordlist[j]) <= lc)
The commented while version ran the whole program: 625 ms, the next generator version ran the whole program in time of 2.125 s.
What can be reason that this more pythonic version cause such a catastroph in performance?
EDIT: Maybe it is caused by use of psyco module? Surely at least the running time with Python 2.7 which has not psyco, was 2.141 for the next version, means almost same as Python 2.6 with psyco.
After removing the *.pyc files I got notthe code to slow down. Then when I removed import of psyco from library module also, I got 2.6 timing also for use without psyco, results for the non psyco version and psyco version (as now the library routine slows down also and it's timing is also relevant:)
not psyco:
psyco:
Running in WindowsXP AMD Sempron 3100+ system with 2GB RAM. Counting the loops and calls with two globals:
j=i
callcount += 1
while j < ls - 1 and len(wordlist[j]) > lc:
j+=1
loopcount += 1
Result for the test input with psyco:
Finished in 625 ms
Loopcount: 78317
Callcount: 47970
Ration: 1.633
So the loop is inside tight loop, but is on average executed only couple of times (notice that two increments of global counters did not slow down the code in psyco)
CONCLUSIONS: Despite the highly sensitive nature of the algorithm relative to vocabulary length, which caused me to pass some imposible words from consideration by this loop, later the basic cases of recursion are checked by dictionary lookup which is O(n), therefore the highly beneficial earlier optimization is become not very beneficial, even with longer input and moving the callcount counter in beginning of the function, showed that call count is not affected by the vocabulary length, but outer loop count is slichtly reduced (the code originally posted is in elif part of if statement).
Longer run times (29 372 solutions) with while loop and the whole loop removed (using i instead of j) (library preparation 312 ms):
(running time without loop, counters and psyco: 32,792 s, library 608 ms)
So without the extra counters the benefit of this loop using psyco is in the harder case: (4688-4594)*100/4688.0 % = 2 %
This inspired me to reverse another earlier optimization, which I had wondered about in DaniWeb. Earlier version of code run faster, when the smallest word size was global, not paramerter. According to documentation, local variable calls are faster, but apparantly the cost in making recursion heavier outweighted that. Now in the harder case this other reversal of optimization brought more expected performance behaviour in the case of no optimization of word lenght: the run time with psyco was 312 ms preparations, 4,469..4,484 s total running time. So this made code cleaner and brought more benefit in this case as the removed loop had. And putting the parameter to version with while loop, did not change the running time much (the variation became greater for library preparation code)
**What I learned from this: If you do n optimizations for speed
you must check the first n-1 optimizations after doing nth one**
I've found that using generators can often be slower than generating the whole list, which is a little counter-intuitive. I've managed to fix performance bottlenecks just by adding a []
pair.
For example compare these:
$ python -m timeit -n 1000 "' '.join(c for c in 'hello world')"
1000 loops, best of 3: 6.11 usec per loop
$ python -m timeit -n 1000 "' '.join([c for c in 'hello world'])"
1000 loops, best of 3: 3.79 usec per loop
It's almost twice as quick to generate the whole list first rather than use a generator even for such a simple case!
Edit: As Thomas Wouters points out in the comments the reason the generator is slower here is because it's such a simple case. For balance here is his test in which the generator is the clear winner:
$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in (C() for c in s): pass"
10 loops, best of 3: 33.6 msec per loop
$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in [C() for c in s]: pass"
10 loops, best of 3: 172 msec per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With