I'm reading ascii and binary files that all specify 3 dimensional arrays in fortran order. I want to perform some arbitrary manipulations on these arrays, then export them to the same ascii or binary format.
I'm confused on the best ways to deal with these arrays in my library. My current design seems prone to error because I have to keep reshaping things from the default C order if any new array is created.
Current design:
I have a few functions that read these files and return numpy arrays. The read functions all behave in a similar way and essentially read in the data and return something like:
return array.reshape((i, j, k), order='F')
The way I understand it, I'm returning a view for fortran order onto the original array.
My code assumes all the arrays are in fortran order. This means any new operations that might create a new array I make sure to use reshape
to convert it back to fortran order.
This seems very error-prone because I have to pay close attention to any operation that creates a new array and make sure to reshape it into fortran order since the default is usually C order.
I later might have to export these arrays to binary or ascii again and need to maintain the fortran ordering. So, I use numpy.nditer
to write each element out in the fortran order.
Concerns:
The current approach seems very error-prone since I typically think in C order. I'm afraid that I'll always be getting bitten by missing calls to reshape
that forces things in C order.
The current approach seems messy because the indexes can be interpreted different ways and things can get confusing.
x[(1, 2, 3)]
for a fortran array means k = 1, j = 2, and i = 3 whereas x[(1, 2, 3)]
for a C order array means k = 3, j = 2, i = 1 correct?Question:
Is there a best practice for doing this type of thing? In an ideal world I'd like to read in the fortran ordered arrays, then forget about ordering until I export to a file. However, I'm afraid I'll keep misinterpreting the indexes, etc.
I've read through the only numpy documentation on this that I can find, http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues. However, the concept still seems as clear as mud to me. Maybe I just need a different explanation of the numpy docs, http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues.
For the 1,000,000,000 element arrays, the Fortran code (without the O2 flag) was only 3.7% faster than the NumPy code. The parallel Numba code really shines with the 8-cores of the AMD-FX870, which was about 4 times faster than MATLAB, and 3 times faster than Numpy.
Tools for reading / writing array data to disk and working with memory-mapped files. Linear algebra, random number generation, and Fourier transform capabilities.
Fortran order/ array is a special case in which all elements of an array are stored in column-major order. Sometimes we need to display array in fortran order, for this numpy has a function known as numpy. nditer().
The concept of vectorized operations on NumPy allows the use of more optimal and pre-compiled functions and mathematical operations on NumPy array objects and data sequences. The Output and Operations will speed up when compared to simple non-vectorized operations.
Numpy abstracts away the difference between Fortran ordering and C-ordering at the python level. (In fact, you can even have other orderings for >2d arrays with numpy. They're all treated the same at the python level.)
The only time you'll need to worry about C vs F ordering is when you're reading/writing to disk or passing the array to lower-level functions.
As an example, let's make a simple 3D array in both C order and Fortran order:
In [1]: import numpy as np
In [2]: c_order = np.arange(27).reshape(3,3,3)
In [3]: f_order = c_order.copy(order='F')
In [4]: c_order
Out[4]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
In [5]: f_order
Out[5]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
Notice that they both look identical (they are at the level we're interacting with them). How can you tell that they're in different orderings? First off, let's take a look at the flags (pay attention to C_CONTIGUOUS
vs F_CONTIGUOUS
):
In [6]: c_order.flags
Out[6]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [7]: f_order.flags
Out[7]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
And if you don't trust the flags, you can effectively view the memory order by looking at arr.ravel(order='K')
. The order='K'
is important. Otherwise, when you call arr.ravel()
the output will be in C-order regardless of the memory layout of the array. order='K'
uses the memory layout.
In [8]: c_order.ravel(order='K')
Out[8]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26])
In [9]: f_order.ravel(order='K')
Out[9]:
array([ 0, 9, 18, 3, 12, 21, 6, 15, 24, 1, 10, 19, 4, 13, 22, 7, 16,
25, 2, 11, 20, 5, 14, 23, 8, 17, 26])
The difference is actually represented (and stored) in the strides
of the array. Notice that c_order
's strides are (72, 24, 8)
, while f_order
's strides are (8, 24, 72)
.
Just to prove that the indexing works the same way:
In [10]: c_order[0,1,2]
Out[10]: 5
In [11]: f_order[0,1,2]
Out[11]: 5
The main place where you'll run into problems with this is when you're reading from or writing to disk. Many file formats expect a particular ordering. I'm guessing that you're working with seismic data formats, and most of them (e.g. Geoprobe .vol's, and I think Petrel's volume format as well) essentially write a binary header and then a Fortran-ordered 3D array to disk.
With that in mind, I'll use a small seismic cube (snippet of some data from my dissertation) as an example.
Both of these are binary arrays of uint8
s with a shape of 50x100x198. One is in C-order, while the other is in Fortran-order. c_order.dat f_order.dat
To read them in:
import numpy as np
shape = (50, 100, 198)
c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape(shape)
f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape, order='F')
assert np.all(c_order == f_order)
Notice that the only difference is specifying the memory layout to reshape
. The memory layout of the two arrays is still different (reshape doesn't make a copy), but they're treated identically at the python level.
Just to prove that the files really are written in a different order:
In [1]: np.fromfile('c_order.dat', dtype=np.uint8)[:10]
Out[1]: array([132, 142, 107, 204, 37, 37, 217, 37, 82, 60], dtype=uint8)
In [2]: np.fromfile('f_order.dat', dtype=np.uint8)[:10]
Out[2]: array([132, 129, 140, 138, 110, 88, 110, 124, 142, 139], dtype=uint8)
Let's visualize the result:
def plot(data):
fig, axes = plt.subplots(ncols=3)
for i, ax in enumerate(axes):
slices = [slice(None), slice(None), slice(None)]
slices[i] = data.shape[i] // 2
ax.imshow(data[tuple(slices)].T, cmap='gray_r')
return fig
plot(c_order).suptitle('C-ordered array')
plot(f_order).suptitle('F-ordered array')
plt.show()
Notice that we indexed them the same way, and they're displayed identically.
First off, let's try reading in the Fortran-ordered file as if it were C-ordered and then take a look at the result (using the plot
function above):
wrong_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape)
plot(wrong_order)
Not so good!
You mentioned that you're having to use "reversed" indicies. This is probably because you fixed what happened in the figure above by doing something like (note the reversed shape!):
c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape([50,100,198])
rev_f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape([198,100,50])
Let's visualize what happens:
plot(c_order).suptitle('C-ordered array')
plot(rev_f_order).suptitle('Incorrectly read Fortran-ordered array')
Note that the image on the far right (the timeslice) of the first plot matches a transposed version of the image on the far left of the second.
Similarly, print rev_f_order[1,2,3]
and print c_order[3,2,1]
both yield 140
, while indexing them the same way gives a different result.
Basically, this is where your reversed indices come from. Numpy thinks it's a C-ordered array with a different shape. Notice if we look at the flags, they're both C-contiguous in memory:
In [24]: rev_f_order.flags
Out[24]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [25]: c_order.flags
Out[25]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
This is because a fortran-ordered array is equivalent to a C-ordered array with the reverse shape.
There's an additional wrinkle when writing a numpy array to disk in Fortran-order.
Unless you specify otherwise, the array will be written in C-order regardless of its memory-layout! (There's a clear note about this in the documentation for ndarray.tofile
, but it's a common gotcha. The opposite behavior would be incorrect, though, i.m.o.)
Therefore, regardless of the memory layout of an array, to write it to disk in Fortran order, you need to do:
arr.ravel(order='F').tofile('output.dat')
If you're writing it as ascii, the same applies. Use ravel(order='F')
and then write out the 1-dimensional result.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With