Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numba - nopython mode slower than object mode?

Tags:

python

numba

First using Numba here. I read that nopython mode should produce faster code but this:

@jit(float64[:](int64[:], int64, float64), nopython=True)
def epsilon_bound(l, k, delta):
    return l/k+np.sqrt(np.log(1/delta))*np.sqrt(1/(2*k))

@jit(float64(float64, int64, float64), nopython=False)
def sim_bin(epsilon, sims, delta):
    k=10000
    s = np.random.binomial(k, epsilon, size=(sims,))
    print(nb.typeof(s))
    bound = epsilon_bound(s, k, delta)
    violations = np.greater(epsilon, bound)
    return np.sum(violations)/float(sims)

%%time
a = sim_bin(0.1, 1_000_000, 0.1)

is running much faster:

array(int64, 1d, C)
CPU times: user 66.7 ms, sys: 0 ns, total: 66.7 ms
Wall time: 65.7 ms

than this:

@jit(float64[:](int64[:], int64, float64), nopython=True)
def epsilon_bound(l, k, delta):
    return l/k+np.sqrt(np.log(1/delta))*np.sqrt(1/(2*k))

@jit(float64(float64, int64, float64), nopython=True)
def sim_bin(epsilon, sims, delta):
    k=10000
    s = np.random.binomial(k, epsilon, size=(sims,))
    #print(nb.typeof(s))
    bound = epsilon_bound(s, k, delta)
    violations = np.greater(epsilon, bound)
    return np.sum(violations)/float(sims)

CPU times: user 4.94 s, sys: 8.02 ms, total: 4.95 s
Wall time: 4.93 s

Running sim_bin.inspect_types() shows that the first option is using all pyobjects while the second option is correctly inferring all the types. According to the documentation (http://numba.pydata.org/numba-doc/0.31.0/glossary.html#term-nopython-mode) nopython mode should produce faster code. Does anyone know what is going on? There must be a good reason but I am new at using Numba. Is it because I am mostly using vectorized numpy functions?

Thanks!!

like image 730
Luk17 Avatar asked Oct 20 '25 02:10

Luk17


1 Answers

One major bottleneck of the function (at least with numba 0.31, windows 10) seems to be the np.random.binomial-call. When I test it:

@jit(nopython=True)
def nbbinom():
    return np.random.binomial(10000, 0.1, size=(1000000,))

nbbinom() # warmup
%timeit nbbinom()
# 1 loop, best of 3: 2.45 s per loop
%timeit np.random.binomial(10000, 0.1, size=(1000000,))
# 10 loops, best of 3: 23.1 ms per loop

However it could be that this depends on the numba version. Numba (unfortunatly) often suffers from performance regressions that (fortunatly) are fixed fast. Probably it only needs an issue on their bug tracker (if it's not already fixed).

But even then your code contains a lot of vectorized operations. You won't get much of a speedboost with numba if you use vectorized operations. As a rule of thumb: If you already can do it with numpy without using python loops you don't need numba (there are exceptions. For example: I found that numba is definetly faster for small arrays where the numpy-ufuncs have significant overhead).

Another thing is that creating the random-array takes much longer (in numpy and numba) than the actual operation:

import numpy as np
from numba import njit

@njit
def epsilon_bound1(l, k, delta):
    return l/k+np.sqrt(np.log(1/delta))*np.sqrt(1/(2*k))

def epsilon_bound2(l, k, delta):
    return l/k+np.sqrt(np.log(1/delta))*np.sqrt(1/(2*k))

def sim_bin(s, k, epsilon, sims, delta, func):
    bound = func(s, k, delta)
    violations = np.greater(epsilon, bound)
    return np.sum(violations)/float(sims)

epsilon = 0.1
sims = 1000000
delta = 0.1
k=10000
s = np.random.binomial(k, epsilon, size=(sims,))
%timeit np.random.binomial(k, epsilon, size=(sims,))
# 1 loop, best of 3: 232 ms per loop

sim_bin(s, k, 0.1, 1000000, 0.1, epsilon_bound1)  # warmup
%timeit sim_bin(s, k, 0.1, 1000000, 0.1, epsilon_bound1)
# 10 loops, best of 3: 28.5 ms per loop
%timeit sim_bin(s, k, 0.1, 1000000, 0.1, epsilon_bound2)
# 10 loops, best of 3: 37.6 ms per loop

So when you benchmark sim_bin you actually only benchmark the call to np.random.binomial, either the relativly fast numpy one or the (currently) quite slow numba implementation.

like image 171
MSeifert Avatar answered Oct 21 '25 16:10

MSeifert