Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it beneficial to transpose an array in order to use column-wise operations?

Assume that we are working with a language which stores arrays in column-major order. Assume also that we have a function which uses 2-D array as an argument, and returns it. I'm wondering can you claim that it is (or isn't) in general beneficial to transpose this array when calling the function in order to work with column-wise operations instead of row-wise operations, or does the transposing negate the the benefits of column-wise operations?

As an example, in R I have a object of class ts named y which has dimension n x p, i.e I have p times series of length n.

I need to make some computations with y in Fortran, where I have two loops with following kind of structure:

do i = 1, n
  do j= 1, p
   !just an example, some row-wise operations  on `y`
   x(i,j) = a*y(i,j) 
   D = ddot(m,y(i,1:p),1,b,1) 
   ! ...
  end do
end do

As Fortran (as does R) uses column-wise storage, it would be better to make the computations with p x n array instead. So instead of

out<-.Fortran("something",y=array(y,dim(y)),x=array(0,dim(y)))
ynew<-out$out$y
x<-out$out$x

I could use

out<-.Fortran("something2",y=t(array(y,dim(y))),x=array(0,dim(y)[2:1]))
ynew<-t(out$out$y)
x<-t(out$out$x)

where Fortran subroutine something2 would be something like

do i = 1, n
  do j= 1, p
   !just an example, some column-wise operations  on `y`
   x(j,i) = a*y(j,i) 
   D = ddot(m,y(1:p,i),1,b,1) 
   ! ...
  end do
end do

Does the choice of approach always depend on the dimensions n and p or is it possible to say one approach is better in terms of computation speed and/or memory requirements? In my application n is usually much larger than p, which is 1 to 10 in most cases.

like image 211
Jouni Helske Avatar asked Oct 22 '22 16:10

Jouni Helske


1 Answers

more of a comment, buy i wanted to put a bit of code: under old school f77 you would essentially be forced to use the second approach as

y(1:p,i)

is simply a pointer to y(1,i), with the following p values contiguous in memory.

the first construct

y(i,1:p)

is a list of values interspaced in memory, so it seems to require making a copy of the data to pass to the subroutine. I say it seems because i haven't the foggiest idea how a modern optimizing compiler deals with these things. I tend to think at best its a wash at worst this could really hurt. Imagine an array so large you need to page swap to access the whole vector.

In the end the only way to answer this is to test it yourself

----------edit did a little testng and confirmed my hunch: passing rows y(i,1:p) does cost you vs passing columns y(1:p,i). I used a subroutine that does practically nothing to see the difference. My guess with any real subroutine the hit is negligable.

Btw (and maybe this helps understand what goes on) passing every other value in a column

y(1:p:2,i) takes longer (orders of magnitude) than passing the whole column, while passing every other value in a row cuts the time in half vs. passing a whole row.

(using gfortran 12..)

like image 76
agentp Avatar answered Oct 24 '22 17:10

agentp