I have a program that receives three-dimensional data as flat arrays in row-major (a.k.a. "C") order as input.
I need to pass these to a library that expects the same three-dimensional data in column-major (a.k.a. "Fortran") order.
Preprocessing the arrays outside of my program is not an option.
Transforming the data while copying is no problem except for performance - there are quite a few arrays of several million elements each, and the allocation and copying is my major bottleneck - so I would like to do the transformation in-place and see if that helps.
However, I have been unable to work out the maths behind this transformation, and my googling has been less than helpful.
Is there an efficient way to perform this transformation in-place?
An in-place transformation (if possible) would copy all the elements of these big arrays anyway, thus it won't be cache-friendly.
Each allocation will be done once for a big array (and its subsequent long transformation) and if you have to deal with a stream of such arrays you could reuse old ones in order to avoid alloc/free repetitions.
I would simply recommend to load the data in the predictible/cache-friendly row-major order and rely on the store-buffer machinery to deal with the column-major store anti-pattern to the second (allocated) array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With