Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pycuda messing up numpy matrix transpose

Tags:

numpy

pycuda

Why does the transposed matrix look differently, when converted to a pycuda.gpuarray?

Can you reproduce this? What could cause this? Am I using the wrong approach?

Example code

from pycuda import gpuarray
import pycuda.autoinit
import numpy

data = numpy.random.randn(2,4).astype(numpy.float32)
data_gpu = gpuarray.to_gpu(data.T)
print "data\n",data
print "data_gpu.get()\n",data_gpu.get()
print "data.T\n",data.T

Output

data
[[ 0.70442784  0.08845157 -0.84840715 -1.81618035]
 [ 0.55292499  0.54911566  0.54672164  0.05098847]]
data_gpu.get()
[[ 0.70442784  0.08845157]
 [-0.84840715 -1.81618035]
 [ 0.55292499  0.54911566]
 [ 0.54672164  0.05098847]]
data.T
[[ 0.70442784  0.55292499]
 [ 0.08845157  0.54911566]
 [-0.84840715  0.54672164]
 [-1.81618035  0.05098847]]
like image 972
Framester Avatar asked Aug 01 '11 15:08

Framester


People also ask

How do you find the transpose of a Numpy array?

NumPy Matrix transpose() - Transpose of an Array in Python The transpose of a matrix is obtained by moving the rows data to the column and columns data to the rows. If we have an array of shape (X, Y) then the transpose of the array will have the shape (Y, X).

How does Numpy transpose work?

The numpy. transpose() function changes the row elements into column elements and the column elements into row elements. The output of this function is a modified array of the original one.


2 Answers

The basic reason is that numpy transpose only creates a view, which has no effect on the underlying array storage, and it is that storage which PyCUDA directly accesses when a copy is performed to device memory. The solution is to use the copy method when doing the transpose, which will create an array with data in the transposed order in host memory, then copy that to the device:

data_gpu = gpuarray.to_gpu(data.T.copy())
like image 56
talonmies Avatar answered Oct 14 '22 13:10

talonmies


In numpy, data.T doesn't do anything to the underlying 1D array. It simply manipulates the strides to obtain the transpose. This makes it a constant-time and constant-memory operation.

It would appear that pycuda.to_gpu() isn't respecting the strides and is simply copying the underlying 1D array. This would produce the exact behaviour you're observing.

In my view there is nothing wrong with your code. Rather, I would consider this a bug in pycuda.

I've googled around, and have found a thread that discusses this issue in detail.

As a workaround, you could try passing numpy.ascontiguousarray(data.T) to gpuarray.to_gpu(). This will, of course, create a second copy of the data in the host RAM.

like image 35
NPE Avatar answered Oct 14 '22 12:10

NPE