I want to integrate apache spark with GPU but spark works on java while gpu uses CUDA/OpenCL so how do we merge them.
It depends on what you want to do. If you want to distribute your computation with GPUs using spark you don't necessary have to use java. You could use python (pyspark) with numba which have a cuda module.
For exemple you can apply this code if you want your worker nodes to compute operation (here gpu_function) on every blocks of your RDD.
rdd = rdd.mapPartition(gpu_function)
with :
def gpu_function(x):
...
input = f(x)
output = ...
gpu_cuda[grid_size,block_size](input,output)
return output
and :
from numba import cuda
@cuda.jit("(float32[:],float32[:])")
def gpu_cuda(input,output)
output = g(input)
I advise you to take a look at the slideshare url : https://fr.slideshare.net/continuumio/gpu-computing-with-apache-spark-and-python ,specificly slide 34.
You only need numba and cuda driver install on every worker node.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With