Why does the transposed matrix look differently, when converted to a pycuda.gpuarray
?
Can you reproduce this? What could cause this? Am I using the wrong approach?
Example code
from pycuda import gpuarray
import pycuda.autoinit
import numpy
data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T
Output
data
[[ 0.70442784 0.08845157 -0.84840715 -1.81618035]
[ 0.55292499 0.54911566 0.54672164 0.05098847]]
data_gpu.get()
[[ 0.70442784 0.08845157]
[-0.84840715 -1.81618035]
[ 0.55292499 0.54911566]
[ 0.54672164 0.05098847]]
data.T
[[ 0.70442784 0.55292499]
[ 0.08845157 0.54911566]
[-0.84840715 0.54672164]
[-1.81618035 0.05098847]]
NumPy Matrix transpose() - Transpose of an Array in Python The transpose of a matrix is obtained by moving the rows data to the column and columns data to the rows. If we have an array of shape (X, Y) then the transpose of the array will have the shape (Y, X).
The numpy. transpose() function changes the row elements into column elements and the column elements into row elements. The output of this function is a modified array of the original one.
The basic reason is that numpy transpose only creates a view, which has no effect on the underlying array storage, and it is that storage which PyCUDA directly accesses when a copy is performed to device memory. The solution is to use the copy
method when doing the transpose, which will create an array with data in the transposed order in host memory, then copy that to the device:
data_gpu = gpuarray.to_gpu(data.T.copy())
In numpy, data.T
doesn't do anything to the underlying 1D array. It simply manipulates the strides to obtain the transpose. This makes it a constant-time and constant-memory operation.
It would appear that pycuda.to_gpu()
isn't respecting the strides and is simply copying the underlying 1D array. This would produce the exact behaviour you're observing.
In my view there is nothing wrong with your code. Rather, I would consider this a bug in pycuda
.
I've googled around, and have found a thread that discusses this issue in detail.
As a workaround, you could try passing numpy.ascontiguousarray(data.T)
to gpuarray.to_gpu()
. This will, of course, create a second copy of the data in the host RAM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With