Consider the following minimal example:
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True
cimport cython
from libc.stdlib cimport malloc
def main(size_t ni, size_t nt, size_t nx):
cdef:
size_t i, j, t, x, y
double[:, :, ::1] a = <double[:ni, :ni, :nx]>malloc(ni * ni * nx * sizeof(double))
double[:, :, ::1] b = <double[:nt, :ni, :nx]>malloc(nt * ni * nx * sizeof(double))
size_t[:, :, ::1] best = <size_t[:nt, :ni, :nx]>malloc(nt * ni * nx * sizeof(size_t))
size_t mxi
double s, mxs
for t in range(nt):
for j in range(ni):
for y in range(nx): # this loops does nothing but is needed for the effect below.
mxs = -1e300
for i in range(ni):
for x in range(nx):
with cython.boundscheck(False): # Faster!?!?
s = b[t, i, x] + a[i, j, x]
if s >= mxs:
mxs = s
mxi = i
best[t + 1, j, y] = mxi
return best[0, 0, 0]
essentially summing two 2D arrays along some specific axes and finding the maximizing index along another axis.
When compiled with gcc -O3 and called with the arguments (1, 2000, 2000), adding the boundscheck=True results in a twice faster execution than when boundscheck=False.
Any hint of why this would be the case? (Well, I can probably guess this has again to do with GCC autovectorization...)
Thanks in advance.
(cross-posted from cython-users)
Boundscheck is a security check that you are accessing indices inside the bounds of the vectors. If you don't bother to do the check if the indices can go out of bounds then it is faster. It takes time to perform the check.
That is, if boundcheck is true, it will check to see if the index is inside the range of the vector before reading or writing to memory. And if not it will throw an error. If boundcheck is false, it will read or write to the pointer even if the index is out of bounds, given out false data by reading and writing to memory corrupting data by writing.
From documentation:
The array lookups are still slowed down by two factors:
1) Bounds checking is performed.
2) Negative indices are checked for and handled correctly.
The consequences of not bound checking being:
Now bounds checking is not performed (and, as a side-effect, if you ‘’do’’ happen to access out of bounds you will in the best case crash your program and in the worst case corrupt data).
Where this is particularly important is you can have None vectors. Here is the warning from the documentation:
Warning
Speed comes with some cost. Especially it can be dangerous to set typed objects (like f, g and h in our sample code) to None. Setting such objects to None is entirely legal, but all you can do with them is check whether they are None. All other use (attribute lookup or indexing) can potentially segfault or corrupt data (rather than raising exceptions as they would in Python).
The actual rules are a bit more complicated but the main message is clear: Do not use typed objects without knowing that they are not set to None.
http://docs.cython.org/src/userguide/numpy_tutorial.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With