cython boundscheck=True faster than boundscheck=False

Question

Consider the following minimal example:

#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True
cimport cython
from libc.stdlib cimport malloc

def main(size_t ni, size_t nt, size_t nx):
    cdef:
        size_t i, j, t, x, y
        double[:, :, ::1] a = <double[:ni, :ni, :nx]>malloc(ni * ni * nx * sizeof(double))
        double[:, :, ::1] b = <double[:nt, :ni, :nx]>malloc(nt * ni * nx * sizeof(double))
        size_t[:, :, ::1] best = <size_t[:nt, :ni, :nx]>malloc(nt * ni * nx * sizeof(size_t))
        size_t mxi
        double s, mxs
    for t in range(nt):
        for j in range(ni):
            for y in range(nx): # this loops does nothing but is needed for the effect below.
                mxs = -1e300
                for i in range(ni):
                    for x in range(nx):
                        with cython.boundscheck(False): # Faster!?!?
                            s = b[t, i, x] + a[i, j, x]
                        if s >= mxs:
                            mxs = s
                            mxi = i
                best[t + 1, j, y] = mxi
    return best[0, 0, 0]

essentially summing two 2D arrays along some specific axes and finding the maximizing index along another axis.

When compiled with gcc -O3 and called with the arguments (1, 2000, 2000), adding the boundscheck=True results in a twice faster execution than when boundscheck=False.

Any hint of why this would be the case? (Well, I can probably guess this has again to do with GCC autovectorization...)

Thanks in advance.

(cross-posted from cython-users)

patapouf_ai · Accepted Answer

Boundscheck is a security check that you are accessing indices inside the bounds of the vectors. If you don't bother to do the check if the indices can go out of bounds then it is faster. It takes time to perform the check.

That is, if boundcheck is true, it will check to see if the index is inside the range of the vector before reading or writing to memory. And if not it will throw an error. If boundcheck is false, it will read or write to the pointer even if the index is out of bounds, given out false data by reading and writing to memory corrupting data by writing.

From documentation:

The array lookups are still slowed down by two factors:

1) Bounds checking is performed.

2) Negative indices are checked for and handled correctly.

The consequences of not bound checking being:

Now bounds checking is not performed (and, as a side-effect, if you ‘’do’’ happen to access out of bounds you will in the best case crash your program and in the worst case corrupt data).

Where this is particularly important is you can have None vectors. Here is the warning from the documentation:

Warning

Speed comes with some cost. Especially it can be dangerous to set typed objects (like f, g and h in our sample code) to None. Setting such objects to None is entirely legal, but all you can do with them is check whether they are None. All other use (attribute lookup or indexing) can potentially segfault or corrupt data (rather than raising exceptions as they would in Python).

The actual rules are a bit more complicated but the main message is clear: Do not use typed objects without knowing that they are not set to None.

http://docs.cython.org/src/userguide/numpy_tutorial.html

cython boundscheck=True faster than boundscheck=False

Tags:

cython

auto-vectorization

antony

1 Answers

patapouf_ai

Recent Activity

Donate For Us

cython boundscheck=True faster than boundscheck=False

Tags:

cython

auto-vectorization

antony

1 Answers

patapouf_ai

Related questions

Recent Activity

Donate For Us