Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA GPU processing: TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

Today I started working with CUDA and GPU processing. I found this tutorial: https://www.geeksforgeeks.org/running-python-script-on-gpu/

Unfortunately my first attempt to run gpu code failed:

from numba import jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@jit(target ="cuda")                         
def func2(a): 
    for i in range(10000000): 
        a[i]+= 1
if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 
    b = np.ones(n, dtype = np.float32) 

    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     

    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 

Output:

/home/amu/anaconda3/bin/python /home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py
without GPU: 4.89985659904778
Traceback (most recent call last):
  File "/home/amu/PycharmProjects/gpu_processing_base/gpu_base_1.py", line 30, in <module>
    func2(a)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/dispatcher.py", line 40, in __call__
    return self.compiled(*args, **kws)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 758, in __call__
    kernel = self.specialize(*args)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 769, in specialize
    kernel = self.compile(argtypes)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/cuda/compiler.py", line 785, in compile
    **self.targetoptions)
  File "/home/amu/anaconda3/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)
TypeError: compile_kernel() got an unexpected keyword argument 'boundscheck'

Process finished with exit code 1

I have installed numba and cudatoolkit mentioned in the tutorial in an anaconda environment in pycharm.

like image 744
Artur Müller Romanov Avatar asked May 24 '20 07:05

Artur Müller Romanov


1 Answers

Adding an answer to get this off the unanswered queue.

The code in that example is broken. It isn't anything wrong with your numba or CUDA installations. There is no way that the code in your question (or the blog you copied it from) can emit the result the blog post claims.

There are many ways this could potentially be modified to work. One would be like this:

from numba import vectorize, jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@vectorize(['float64(float64)'], target ="cuda")                         
def func2(x): 
    return x+1

if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 

    start = timer() 
    func(a) 
    print("without GPU:", timer()-start)     

    start = timer() 
    func2(a) 
    print("with GPU:", timer()-start) 

Here func2 becomes a ufunc which is compiled for the device. It will then be run over the whole input array on the GPU. Doing so does this:

$ python bogoexample.py 
without GPU: 4.314514834433794
with GPU: 0.21419800259172916

So it is faster, but keep in mind that the GPU time includes the time taken for compilation of the GPU ufunc

Another alternative would be to actually write a GPU kernel. Like this:

from numba import vectorize, jit, cuda 
import numpy as np 
# to measure exec time 
from timeit import default_timer as timer 

# normal function to run on cpu 
def func(a):                                 
    for i in range(10000000): 
        a[i]+= 1    

# function optimized to run on gpu 
@vectorize(['float64(float64)'], target ="cuda")                         
def func2(x): 
    return x+1

# kernel to run on gpu
@cuda.jit
def func3(a, N):
    tid = cuda.grid(1)
    if tid < N:
        a[tid] += 1


if __name__=="__main__": 
    n = 10000000                            
    a = np.ones(n, dtype = np.float64) 

    for i in range(0,5):
         start = timer() 
         func(a) 
         print(i, " without GPU:", timer()-start)     

    for i in range(0,5):
         start = timer() 
         func2(a) 
         print(i, " with GPU ufunc:", timer()-start) 

    threadsperblock = 1024
    blockspergrid = (a.size + (threadsperblock - 1)) // threadsperblock
    for i in range(0,5):
         start = timer() 
         func3[blockspergrid, threadsperblock](a, n) 
         print(i, " with GPU kernel:", timer()-start) 

which runs like this:

$ python bogoexample.py 
0  without GPU: 4.885275377891958
1  without GPU: 4.748716968111694
2  without GPU: 4.902181145735085
3  without GPU: 4.889955999329686
4  without GPU: 4.881594380363822
0  with GPU ufunc: 0.16726416163146496
1  with GPU ufunc: 0.03758022002875805
2  with GPU ufunc: 0.03580896370112896
3  with GPU ufunc: 0.03530424740165472
4  with GPU ufunc: 0.03579768259078264
0  with GPU kernel: 0.1421878095716238
1  with GPU kernel: 0.04386183246970177
2  with GPU kernel: 0.029975440353155136
3  with GPU kernel: 0.029602501541376114
4  with GPU kernel: 0.029780613258481026

Here you can see that the kernel runs slightly faster than the ufunc, and that caching (and this is caching of the JIT compiled functions, not memoization of the calls) significantly speeds up the call on the GPU.

like image 128
2 revs Avatar answered Oct 17 '22 05:10

2 revs