Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cublasSetVector() vs cudaMemcpy()

Tags:

cuda

cublas

I am wondering if there is a difference between:

// cumalloc.c - Create a device on the device
HOST float * cudamath_vector(const float * h_vector, const int m)
{
  float *d_vector = NULL;
  cudaError_t cudaStatus;
  cublasStatus_t cublasStatus;

  cudaStatus = cudaMalloc(&d_vector, sizeof(float) * m );

  if(cudaStatus == cudaErrorMemoryAllocation) {
    printf("ERROR: cumalloc.cu, cudamath_vector() : cudaErrorMemoryAllocation");
    return NULL;
  }


  /*    THIS: */ cublasSetVector(m, sizeof(*d_vector), h_vector, 1, d_vector, 1);

  /* OR THAT: */ cudaMemcpy(d_vector, h_vector, sizeof(float) * m, cudaMemcpyHostToDevice);


  return d_vector;
}

cublasSetVector() has two arguments incx and incy and the documentation says:

The storage spacing between consecutive elements is given by incx for the source vector x and for the destination vector y.

In the NVIDIA forum someone said:

iona_me: "incx and incy are strides measured in floats."

So does this mean that for incx = incy = 1 all elements of a float[] will be sizeof(float)-aligned and for incx = incy = 2 there would be a sizeof(float)-padding between each element?

  • Except for those two parameters and the cublasHandle - does cublasSetVector() anything else what cudaMalloc() doesn't do?
  • Would it be save to pass a vector/matrix which was not created with their respective cublas*() function to other CUBLAS functions to manipulate them?
like image 824
Stefan Falk Avatar asked Jun 09 '14 13:06

Stefan Falk


1 Answers

There is a comment in a thread of the NVIDIA Forum provided by Massimiliano Fatica confirming my statement in the above comment (or, saying it better, my comment originated by a recall of having read the post I linked to). In particular

cublasSetVector, cubblasGetVector, cublasSetMatrix, cublasGetMatrix are thin wrappers around cudaMemcpy and cudaMemcpy2D. Therefore, no significant performance differences are expected between the two sets of copy functions.

Accordingly, you can safely pass any array created by cudaMalloc as input to cublasSetVector.

Concerning the strides, perhaps there is a misprint in the guide (as of CUDA 6.0), which says that

The storage spacing between consecutive elements is given by incx for the source vector x and for the destination vector y.

but perhaps should be read as

The storage spacing between consecutive elements is given by incx for the source vector x and incy for the destination vector y.

like image 130
Vitality Avatar answered Sep 20 '22 21:09

Vitality