Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to have Apache Spark running on GPU?

I want to integrate apache spark with GPU but spark works on java while gpu uses CUDA/OpenCL so how do we merge them.

like image 566
saurabh kulkarni Avatar asked Feb 19 '17 14:02

saurabh kulkarni


1 Answers

It depends on what you want to do. If you want to distribute your computation with GPUs using spark you don't necessary have to use java. You could use python (pyspark) with numba which have a cuda module.

For exemple you can apply this code if you want your worker nodes to compute operation (here gpu_function) on every blocks of your RDD.

rdd = rdd.mapPartition(gpu_function)

with :

def gpu_function(x):
    ...
    input = f(x)
    output = ...
    gpu_cuda[grid_size,block_size](input,output)
    return output

and :

from numba import cuda
@cuda.jit("(float32[:],float32[:])")
def gpu_cuda(input,output)
    output = g(input)

I advise you to take a look at the slideshare url : https://fr.slideshare.net/continuumio/gpu-computing-with-apache-spark-and-python ,specificly slide 34.

You only need numba and cuda driver install on every worker node.

like image 175
Adrien Forbu Avatar answered Sep 28 '22 05:09

Adrien Forbu