Because an algorithm I want to implement uses indices 1..n
and because it's very error prone to shift every index by one, I decided to get smart and inserted a dummy element in the beginning of every list, so I can use original formulas from the paper.
For the sake of shortness, consider this toy example:
def calc(N):
nums=[0]+range(1,N+1)
return sum(nums[1:]) #skip first element
However, I got worried, that my results are spurious, because I could access the 0-th element by accident somewhere and not be aware of it. So I got even smarter and used None
instead of 0
as the first element - every arithmetical operation with it would result in a runtime error:
def calc_safe(N):
nums=[None]+range(1,N+1) #here we use "None"
return sum(nums[1:])
Surprisingly, this small change led to a huge performance penalty for pypy (even with the current 5.8-version) - the code became around 10 times slower! Here are the times on my machine:
pypy-5.8 cpython
calc(10**8) 0.5 sec 5.5 sec
calc_safe(10**8) 7.5 sec 5.5 sec
As a side node: Cpython does not care, whether None
is used or not.
So my question is twofold:
None
is not a good idea, but why?None
-approach and keep the performance?Edit: As Armin has explained, not all lists are equal, and we can see, which strategy is used via:
import __pypy__
print __pypy__.strategy(nums)
In the first case it is IntegerListStrategy
and in the second ObjectListStrategy
. The same would happen if we used a large integer value (like 2**100
) instead of None
.
PyPy has got a special case for lists containing only integers---it stores them like an array.array
. If there is a None in it, then this optimization no longer works.
This could probably be fixed inside PyPy to allow None as a special case...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With