I am porting some physics simulation code from C++ to CUDA.
The fundamental algorithm can be understood as: applying an operator to each element of a vector. In pseudocode, a simulation might include the following kernel call:
apply(Operator o, Vector v){
...
}
For instance:
apply(add_three_operator, some_vector)
would add three to each element in the vector.
In my C++ code, I have an abstract base class Operator, with many different concrete implementations. The important method is class Operator{ virtual double operate(double x) =0; Operator compose(Operator lo, Operator ro); ... }
The implementation for AddOperator might look like this:
class AddOperator : public Operator{
private:
double to_add;
public:
AddOperator(double to_add): to_add(to_add){}
double operator(double x){
return x + to_add;
}
};
The operator class has methods for scaling and composing concrete implementations of Operator. This abstraction allows me to simply compose "leaf" operators into more general transformations.
For instance:
apply(compose(add_three_operator, square_operator), some_vector);
would add three then square each element of the vector.
The problem is CUDA doesn't support virtual method calls in the kernel. My current thought is to use templates. Then kernel calls will look something like:
apply<Composition<AddOperator,SquareOperator>>
(compose(add_three_operator, square_operator), some_vector);
Any suggestions?
In order to run a kernel on the CUDA threads, we need two things. First, in the main() function of the program, we call the function to be executed by each thread on the GPU. This invocation is called Kernel Launch and with it we need provide the number of threads and their grouping.
Using the CUDA Toolkit you can accelerate your C or C++ applications by updating the computationally intensive portions of your code to run on GPUs. To accelerate your applications, you can call functions from drop-in libraries as well as develop custom applications using languages including C, C++, Fortran and Python.
Figure 1 shows that the CUDA kernel is a function that gets executed on GPU. The parallel portion of your applications is executed K times in parallel by K different CUDA threads, as opposed to only one time like regular C/C++ functions. Figure 1. The kernel is a function executed on the GPU.
Something like this perhaps...
template <class Op1, class Op2>
class Composition {...}
template <class Op1, class Op2>
Composition<Op1, Op2> compose(Op1& op1, Op2& op2) {...}
template<class C>
void apply(C& c, VecType& vec){...}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With