Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thrust Complex Transform of 3 different size vectors

Tags:

cuda

thrust

Hello I have this loop in C+, and I was trying to convert it to thrust but without getting the same results... Any ideas? thank you

C++ Code

for (i=0;i<n;i++) 
    for (j=0;j<n;j++) 
      values[i]=values[i]+(binv[i*n+j]*d[j]);

Thrust Code

thrust::fill(values.begin(), values.end(), 0);
thrust::transform(make_zip_iterator(make_tuple(
                thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
                binv.begin(),
                thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))))),
                make_zip_iterator(make_tuple(
                thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))) + n,
                binv.end(),
                thrust::make_permutation_iterator(d.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n))) + n)),
                thrust::make_permutation_iterator(values.begin(), thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n))),
                function1()
                );

Thrust Functions

struct IndexDivFunctor: thrust::unary_function<int, int>
{
  int n;

  IndexDivFunctor(int n_) : n(n_) {}

  __host__ __device__
  int operator()(int idx)
  {
    return idx / n;
  }
};

struct IndexModFunctor: thrust::unary_function<int, int>
{
  int n;

  IndexModFunctor(int n_) : n(n_) {}

  __host__ __device__
  int operator()(int idx)
  {
    return idx % n;
  }
};


struct function1
{
  template <typename Tuple>
  __host__ __device__
  double operator()(Tuple v)
  {
    return thrust::get<0>(v) + thrust::get<1>(v) * thrust::get<2>(v);
  }
};
like image 459
Yannis Assael Avatar asked Oct 10 '22 21:10

Yannis Assael


1 Answers

To begin with, some general comments. Your loop

for (i=0;i<n;i++) 
    for (j=0;j<n;j++) 
      v[i]=v[i]+(B[i*n+j]*d[j]);

is the equivalent of the standard BLAS gemv operation

enter image description here

where the matrix is stored in row major order. The optimal way to do this on the device would be using CUBLAS, not something constructed out of thrust primitives.

Having said that, there is absolutely no way the thrust code you posted is ever going to do what your serial code does. The errors you are seeing are not as a result of floating point associativity. Fundamentally thrust::transform applies the functor supplied to every element of the input iterator and stores the result on the output iterator. To yield the same result as the loop you posted, the thrust::transform call would need to perform (n*n) operations of the fmad functor you posted. Clearly it does not. Further, there is no guarantee that thrust::transform would perform the summation/reduction operation in a fashion that would be safe from memory races.

The correct solution is probably going to be something like:

  1. Use thrust::transform to compute the (n*n) products of the elements of B and d
  2. Use thrust::reduce_by_key to reduce the products into partial sums, yielding Bd
  3. Use thrust::transform to add the resulting matrix-vector product to v to yield the final result.

In code, firstly define a functor like this:

struct functor
{
  template <typename Tuple>
  __host__ __device__
  double operator()(Tuple v)
  {
    return thrust::get<0>(v) * thrust::get<1>(v);
  }
};

Then do the following to compute the matrix-vector multiplication

  typedef thrust::device_vector<int> iVec;
  typedef thrust::device_vector<double> dVec;

  typedef thrust::counting_iterator<int> countIt;
  typedef thrust::transform_iterator<IndexDivFunctor, countIt> columnIt;
  typedef thrust::transform_iterator<IndexModFunctor, countIt> rowIt;

  // Assuming the following allocations on the device
  dVec B(n*n), v(n), d(n);

  // transformation iterators mapping to vector rows and columns
  columnIt cv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexDivFunctor(n));
  columnIt cv_end   = cv_begin + (n*n);

  rowIt rv_begin = thrust::make_transform_iterator(thrust::make_counting_iterator(0), IndexModFunctor(n));
  rowIt rv_end   = rv_begin + (n*n);

  dVec temp(n*n);
  thrust::transform(make_zip_iterator(
                      make_tuple(
                        B.begin(),
                        thrust::make_permutation_iterator(d.begin(),rv_begin) ) ),
                    make_zip_iterator(
                      make_tuple(
                        B.end(),
                        thrust::make_permutation_iterator(d.end(),rv_end) ) ),
                    temp.begin(),
                    functor());

  iVec outkey(n);
  dVec Bd(n);
  thrust::reduce_by_key(cv_begin, cv_end, temp.begin(), outkey.begin(), Bd.begin());
  thrust::transform(v.begin(), v.end(), Bd.begin(), v.begin(), thrust::plus<double>());

Of course, this is a terribly inefficient way to do the computation compared to using a purpose designed matrix-vector multiplication code like dgemv from CUBLAS.

like image 162
talonmies Avatar answered Oct 14 '22 01:10

talonmies