Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numba jit warnings interpretation in python

I have defined the following recursive array generator and am using Numba jit to try and accelerate the processing (based on this SO answer)

@jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
def calc_func(a, b, n):
    res = np.empty(n, dtype="float32")
    res[0] = 0
    for i in range(1, n):
        res[i] = a * res[i - 1] + (1 - a) * (b ** (i - 1))
    return res
a = calc_func(0.988, 0.9988, 5000)

I am getting a bunch of warnings/errors that I do not quite get. Would appreciate help in explaining them and making them disappear in order to (I'm assuming) speed up the calculation even more.

Here they are below :

NumbaWarning: Compilation is falling back to object mode WITH looplifting enabled because Function "calc_func" failed type inference due to: Invalid use of Function() with argument(s) of type(s): (int64, dtype=Literalstr) * parameterized

In definition 0: All templates rejected with literals.

In definition 1: All templates rejected without literals. This error is usually caused by passing an argument of a type that is unsupported by the named function.

[1] During: resolving callee type: Function()

[2] During: typing of call at res = np.empty(n, dtype="float32")

File "thenameofmyscript.py", line 71:

def calc_func(a, b, n):
    res = np.empty(n, dtype="float32")
    ^

@jit("float32:", nopython=False, nogil=True)

thenameofmyscript.py:69: NumbaWarning: Compilation is falling back to object mode WITHOUT looplifting enabled because Function "calc_func" failed type inference due to: cannot determine Numba type of <class 'numba.dispatcher.LiftedLoop'>

File "thenameofmyscript.py", line 73:

def calc_func(a, b, n):
        <source elided>
        res[0] = 0
        for i in range(1, n):
        ^

@jit("float32:", nopython=False, nogil=True)

H:\projects\decay-optimizer\venv\lib\site-packages\numba\compiler.py:742: NumbaWarning: Function "calc_func" was compiled in object mode without forceobj=True, but has lifted loops.

File "thenameofmyscript.py", line 70:

@jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
    def calc_func(a, b, n):
    ^

self.func_ir.loc))

H:\projects\decay-optimizer\venv\lib\site-packages\numba\compiler.py:751: NumbaDeprecationWarning: Fall-back from the nopython compilation path to the object mode compilation path has been detected, this is deprecated behaviour.

File "thenameofmyscript.py", line 70:

@jit("float32[:](float32,float32,intp)", nopython=False, nogil=True)
    def calc_func(a, b, n):
    ^

warnings.warn(errors.NumbaDeprecationWarning(msg, self.func_ir.loc))

thenameofmyscript.py:69: NumbaWarning: Code running in object mode won't allow parallel execution despite nogil=True. @jit("float32:", nopython=False, nogil=True)

like image 469
Chapo Avatar asked Jul 31 '19 07:07

Chapo


People also ask

What does Numba JIT do?

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

How does JIT work in Python?

Numba is what is called a JIT (just-in-time) compiler. It takes Python functions designated by particular annotations (more about that later), and transforms as much as it can — via the LLVM (Low Level Virtual Machine) compiler — to efficient CPU and GPU (via CUDA for Nvidia GPUs and HSA for AMD GPUs) code.

Is there a JIT for Python?

There are two common approaches to compiling Python code - using a Just-In-Time (JIT) compiler and using Cython for Ahead of Time (AOT) compilation.

Is Numba better than NumPy?

Large dataFor larger input data, Numba version of function is must faster than Numpy version, even taking into account of the compiling time. In fact, the ratio of the Numpy and Numba run time will depends on both datasize, and the number of loops, or more general the nature of the function (to be compiled).


1 Answers

1. Optimize the function (algebraic simplification)

Modern CPUs are quite fast at additions, subtractions and multiplications. Operations like exponentiation, should be avoided when possible.

Example

In this example I replaced the costly exponentiation by a simple multiplication. Simplifications like that can lead to quite high speedups, but also may change the result.

At first your implementation (float64) without any signatures, I will treat this later on another simple example.

#nb.jit(nopython=True) is a shortcut for @nb.njit()
@nb.njit()
def calc_func_opt_1(a, b, n):
    res = np.empty(n, dtype=np.float64)
    fact=b
    res[0] = 0.
    res[1] = a * res[0] + (1. - a) *1.
    res[2] = a * res[1] + (1. - a) * fact
    for i in range(3, n):
        fact*=b
        res[i] = a * res[i - 1] + (1. - a) * fact
    return res

Also a good idea is to use scalars where possible.

@nb.njit()
def calc_func_opt_2(a, b, n):
    res = np.empty(n, dtype=np.float64)
    fact_1=b
    fact_2=0.
    res[0] = fact_2
    fact_2=a * fact_2 + (1. - a) *1.
    res[1] = fact_2
    fact_2 = a * fact_2 + (1. - a) * fact_1
    res[2]=fact_2
    for i in range(3, n):
        fact_1*=b
        fact_2= a * fact_2 + (1. - a) * fact_1
        res[i] = fact_2
    return res

Timings

%timeit a = calc_func(0.988, 0.9988, 5000)
222 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a = calc_func_opt_1(0.988, 0.9988, 5000)
22.7 µs ± 45.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit a = calc_func_opt_2(0.988, 0.9988, 5000)
15.3 µs ± 35.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

2. Are signatures recommendable?

In Ahead of time mode (AOT) signatures are necessary, but not in the usual JIT mode. The example above is not SIMD- vectorizable. So you won't see much positive nor negative effects of a possibly not optimal declaration of in- and outputs. Let's look at another example.

#Numba is able to SIMD-vectorize this loop if 
#a,b,res are contigous arrays
@nb.njit(fastmath=True)
def some_function_1(a,b):
    res=np.empty_like(a)
    for i in range(a.shape[0]):
        res[i]=a[i]**2+b[i]**2
    return res

@nb.njit("float64[:](float64[:],float64[:])",fastmath=True)
def some_function_2(a,b):
    res=np.empty_like(a)
    for i in range(a.shape[0]):
        res[i]=a[i]**2+b[i]**2
    return res

a=np.random.rand(10_000)
b=np.random.rand(10_000)

#Example for non contiguous input
#a=np.random.rand(10_000)[0::2]
#b=np.random.rand(10_000)[0::2]

%timeit res=some_function_1(a,b)
5.59 µs ± 36.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit res=some_function_2(a,b)
9.36 µs ± 47.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Why is the version with signatures slower?

Let's have a closer look on the signatures.

some_function_1.nopython_signatures
#[(array(float64, 1d, C), array(float64, 1d, C)) -> array(float64, 1d, C)]
some_function_2.nopython_signatures
#[(array(float64, 1d, A), array(float64, 1d, A)) -> array(float64, 1d, A)]
#this is equivivalent to 
#"float64[::1](float64[::1],float64[::1])"

If the memory layout is unknown at compile time, it is often impossible to SIMD- vectorize the algorithm. Of course you can explicitly declare C-contigous arrays, but the function wont work anymore for non contigous inputs, which is normally not intended.

like image 197
max9111 Avatar answered Oct 19 '22 07:10

max9111