I am wondering if there is a difference between:
// cumalloc.c - Create a device on the device
HOST float * cudamath_vector(const float * h_vector, const int m)
{
float *d_vector = NULL;
cudaError_t cudaStatus;
cublasStatus_t cublasStatus;
cudaStatus = cudaMalloc(&d_vector, sizeof(float) * m );
if(cudaStatus == cudaErrorMemoryAllocation) {
printf("ERROR: cumalloc.cu, cudamath_vector() : cudaErrorMemoryAllocation");
return NULL;
}
/* THIS: */ cublasSetVector(m, sizeof(*d_vector), h_vector, 1, d_vector, 1);
/* OR THAT: */ cudaMemcpy(d_vector, h_vector, sizeof(float) * m, cudaMemcpyHostToDevice);
return d_vector;
}
cublasSetVector() has two arguments incx and incy and the documentation says:
The storage spacing between consecutive elements is given by incx for the source vector x and for the destination vector y.
In the NVIDIA forum someone said:
iona_me: "incx and incy are strides measured in floats."
So does this mean that for incx = incy = 1 all elements of a float[] will be sizeof(float)-aligned and for incx = incy = 2 there would be a sizeof(float)-padding between each element?
cublasHandle - does cublasSetVector() anything else what cudaMalloc() doesn't do? cublas*() function to other CUBLAS functions to manipulate them? There is a comment in a thread of the NVIDIA Forum provided by Massimiliano Fatica confirming my statement in the above comment (or, saying it better, my comment originated by a recall of having read the post I linked to). In particular
cublasSetVector,cubblasGetVector,cublasSetMatrix,cublasGetMatrixare thin wrappers aroundcudaMemcpyandcudaMemcpy2D. Therefore, no significant performance differences are expected between the two sets of copy functions.
Accordingly, you can safely pass any array created by cudaMalloc as input to cublasSetVector.
Concerning the strides, perhaps there is a misprint in the guide (as of CUDA 6.0), which says that
The storage spacing between consecutive elements is given by
incxfor the source vectorxand for the destination vectory.
but perhaps should be read as
The storage spacing between consecutive elements is given by
incxfor the source vectorxandincyfor the destination vectory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With