Assume that we are working with a language which stores arrays in column-major order. Assume also that we have a function which uses 2-D array as an argument, and returns it. I'm wondering can you claim that it is (or isn't) in general beneficial to transpose this array when calling the function in order to work with column-wise operations instead of row-wise operations, or does the transposing negate the the benefits of column-wise operations?
As an example, in R I have a object of class ts named y
which has dimension n x p
, i.e I have p
times series of length n
.
I need to make some computations with y
in Fortran, where I have two loops with following kind of structure:
do i = 1, n
do j= 1, p
!just an example, some row-wise operations on `y`
x(i,j) = a*y(i,j)
D = ddot(m,y(i,1:p),1,b,1)
! ...
end do
end do
As Fortran (as does R) uses column-wise storage, it would be better to make the computations with p x n
array instead. So instead of
out<-.Fortran("something",y=array(y,dim(y)),x=array(0,dim(y)))
ynew<-out$out$y
x<-out$out$x
I could use
out<-.Fortran("something2",y=t(array(y,dim(y))),x=array(0,dim(y)[2:1]))
ynew<-t(out$out$y)
x<-t(out$out$x)
where Fortran subroutine something2
would be something like
do i = 1, n
do j= 1, p
!just an example, some column-wise operations on `y`
x(j,i) = a*y(j,i)
D = ddot(m,y(1:p,i),1,b,1)
! ...
end do
end do
Does the choice of approach always depend on the dimensions n
and p
or is it possible to say one approach is better in terms of computation speed and/or memory requirements? In my application n
is usually much larger than p
, which is 1 to 10 in most cases.
more of a comment, buy i wanted to put a bit of code: under old school f77 you would essentially be forced to use the second approach as
y(1:p,i)
is simply a pointer to y(1,i), with the following p values contiguous in memory.
the first construct
y(i,1:p)
is a list of values interspaced in memory, so it seems to require making a copy of the data to pass to the subroutine. I say it seems because i haven't the foggiest idea how a modern optimizing compiler deals with these things. I tend to think at best its a wash at worst this could really hurt. Imagine an array so large you need to page swap to access the whole vector.
In the end the only way to answer this is to test it yourself
----------edit
did a little testng and confirmed my hunch: passing rows y(i,1:p)
does cost you vs passing columns y(1:p,i)
. I used a subroutine that does practically nothing to see the difference. My guess with any real subroutine the hit is negligable.
Btw (and maybe this helps understand what goes on) passing every other value in a column
y(1:p:2,i)
takes longer (orders of magnitude) than passing the whole column, while passing every other value in a row cuts the time in half vs. passing a whole row.
(using gfortran 12..)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With