Best practices with reading and operating on fortran ordered arrays with numpy

Tags:

I'm reading ascii and binary files that all specify 3 dimensional arrays in fortran order. I want to perform some arbitrary manipulations on these arrays, then export them to the same ascii or binary format.

I'm confused on the best ways to deal with these arrays in my library. My current design seems prone to error because I have to keep reshaping things from the default C order if any new array is created.

Current design:

I have a few functions that read these files and return numpy arrays. The read functions all behave in a similar way and essentially read in the data and return something like:

return array.reshape((i, j, k), order='F')

The way I understand it, I'm returning a view for fortran order onto the original array.

My code assumes all the arrays are in fortran order. This means any new operations that might create a new array I make sure to use reshape to convert it back to fortran order.

This seems very error-prone because I have to pay close attention to any operation that creates a new array and make sure to reshape it into fortran order since the default is usually C order.

I later might have to export these arrays to binary or ascii again and need to maintain the fortran ordering. So, I use numpy.nditer to write each element out in the fortran order.

Concerns:

The current approach seems very error-prone since I typically think in C order. I'm afraid that I'll always be getting bitten by missing calls to reshape that forces things in C order.
- I'd like to not have to worry about the ordering of the array elements except when reading the input files or writing the data to the output files.
The current approach seems messy because the indexes can be interpreted different ways and things can get confusing.
- When dealing with fortran arrays the tuple ordering for indexes is backwards, right?
- So, x[(1, 2, 3)] for a fortran array means k = 1, j = 2, and i = 3 whereas x[(1, 2, 3)] for a C order array means k = 3, j = 2, i = 1 correct?
- This means that me and users of my library must always think of indexes in (k, j, i) order, not what we are C/Python programmers typically think in, (i, j, k).

Question:

Is there a best practice for doing this type of thing? In an ideal world I'd like to read in the fortran ordered arrays, then forget about ordering until I export to a file. However, I'm afraid I'll keep misinterpreting the indexes, etc.

I've read through the only numpy documentation on this that I can find, http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues. However, the concept still seems as clear as mud to me. Maybe I just need a different explanation of the numpy docs, http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues.

409

asked Mar 13 '14 17:03

durden2.0

1 Answers

Numpy abstracts away the difference between Fortran ordering and C-ordering at the python level. (In fact, you can even have other orderings for >2d arrays with numpy. They're all treated the same at the python level.)

The only time you'll need to worry about C vs F ordering is when you're reading/writing to disk or passing the array to lower-level functions.

A Simple Example

As an example, let's make a simple 3D array in both C order and Fortran order:

In [1]: import numpy as np

In [2]: c_order = np.arange(27).reshape(3,3,3)

In [3]: f_order = c_order.copy(order='F')

In [4]: c_order
Out[4]: 
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [5]: f_order
Out[5]: 
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

Notice that they both look identical (they are at the level we're interacting with them). How can you tell that they're in different orderings? First off, let's take a look at the flags (pay attention to C_CONTIGUOUS vs F_CONTIGUOUS):

In [6]: c_order.flags
Out[6]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [7]: f_order.flags
Out[7]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

And if you don't trust the flags, you can effectively view the memory order by looking at arr.ravel(order='K'). The order='K' is important. Otherwise, when you call arr.ravel() the output will be in C-order regardless of the memory layout of the array. order='K' uses the memory layout.

In [8]: c_order.ravel(order='K')
Out[8]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])

In [9]: f_order.ravel(order='K')
Out[9]: 
array([ 0,  9, 18,  3, 12, 21,  6, 15, 24,  1, 10, 19,  4, 13, 22,  7, 16,
       25,  2, 11, 20,  5, 14, 23,  8, 17, 26])

The difference is actually represented (and stored) in the strides of the array. Notice that c_order's strides are (72, 24, 8), while f_order's strides are (8, 24, 72).

Just to prove that the indexing works the same way:

In [10]: c_order[0,1,2]
Out[10]: 5

In [11]: f_order[0,1,2]
Out[11]: 5

Reading and Writing

The main place where you'll run into problems with this is when you're reading from or writing to disk. Many file formats expect a particular ordering. I'm guessing that you're working with seismic data formats, and most of them (e.g. Geoprobe .vol's, and I think Petrel's volume format as well) essentially write a binary header and then a Fortran-ordered 3D array to disk.

With that in mind, I'll use a small seismic cube (snippet of some data from my dissertation) as an example.

Both of these are binary arrays of uint8s with a shape of 50x100x198. One is in C-order, while the other is in Fortran-order. c_order.dat f_order.dat

To read them in:

import numpy as np
shape = (50, 100, 198)

c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape(shape)
f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape, order='F')

assert np.all(c_order == f_order)

Notice that the only difference is specifying the memory layout to reshape. The memory layout of the two arrays is still different (reshape doesn't make a copy), but they're treated identically at the python level.

Just to prove that the files really are written in a different order:

In [1]: np.fromfile('c_order.dat', dtype=np.uint8)[:10]
Out[1]: array([132, 142, 107, 204,  37,  37, 217,  37,  82,  60], dtype=uint8)

In [2]: np.fromfile('f_order.dat', dtype=np.uint8)[:10]
Out[2]: array([132, 129, 140, 138, 110,  88, 110, 124, 142, 139], dtype=uint8)

Let's visualize the result:

def plot(data):
    fig, axes = plt.subplots(ncols=3)
    for i, ax in enumerate(axes):
        slices = [slice(None), slice(None), slice(None)]
        slices[i] = data.shape[i] // 2
        ax.imshow(data[tuple(slices)].T, cmap='gray_r')
    return fig

plot(c_order).suptitle('C-ordered array')
plot(f_order).suptitle('F-ordered array')
plt.show()

enter image description here

Notice that we indexed them the same way, and they're displayed identically.

Common Mistakes with IO

First off, let's try reading in the Fortran-ordered file as if it were C-ordered and then take a look at the result (using the plot function above):

wrong_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape(shape)
plot(wrong_order)

enter image description here

Not so good!

You mentioned that you're having to use "reversed" indicies. This is probably because you fixed what happened in the figure above by doing something like (note the reversed shape!):

c_order = np.fromfile('c_order.dat', dtype=np.uint8).reshape([50,100,198])
rev_f_order = np.fromfile('f_order.dat', dtype=np.uint8).reshape([198,100,50])

Let's visualize what happens:

plot(c_order).suptitle('C-ordered array')
plot(rev_f_order).suptitle('Incorrectly read Fortran-ordered array')

enter image description here

Note that the image on the far right (the timeslice) of the first plot matches a transposed version of the image on the far left of the second.

Similarly, print rev_f_order[1,2,3] and print c_order[3,2,1] both yield 140, while indexing them the same way gives a different result.

Basically, this is where your reversed indices come from. Numpy thinks it's a C-ordered array with a different shape. Notice if we look at the flags, they're both C-contiguous in memory:

In [24]: rev_f_order.flags
Out[24]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [25]: c_order.flags
Out[25]: 
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This is because a fortran-ordered array is equivalent to a C-ordered array with the reverse shape.

Writing to Disk in Fortran-Order

There's an additional wrinkle when writing a numpy array to disk in Fortran-order.

Unless you specify otherwise, the array will be written in C-order regardless of its memory-layout! (There's a clear note about this in the documentation for ndarray.tofile, but it's a common gotcha. The opposite behavior would be incorrect, though, i.m.o.)

Therefore, regardless of the memory layout of an array, to write it to disk in Fortran order, you need to do:

arr.ravel(order='F').tofile('output.dat')

If you're writing it as ascii, the same applies. Use ravel(order='F') and then write out the 1-dimensional result.

194

answered Sep 20 '22 17:09

Joe Kington

Related questions
                            
                                Coerce in django forms
                            
                                How to get a response of multiple objects using rest_framework and Django
                            
                                How to use two different functions within crosstab/pivot_table in pandas?
                            
                                Avoid "exception ignored" in python enhanced generator
                            
                                Equivalent of Python's KeyError exception in JavaScript?
                            
                                Python Optional Argument Pair
                            
                                How to give arguments after Here Document?
                            
                                How to judge (or how to write) a python function with no side effects?
                            
                                Detect keypress when Gtk.Entry is in focus
                            
                                yield - statement or expression?
                            
                                First call to pytz.timezone is slow in virtualenv
                            
                                How to set background color of a column in a matplotlib table
                            
                                Error when writing python pandas dataframe to csv file
                            
                                Read multiple excel file with different sheets names in pandas
                            
                                How to load and play a video in pygame
                            
                                Python and order of methods in multiple inheritance
                            
                                Django: class-view lacking attribute COOKIES
                            
                                time.sleep(x) not working as it should? [duplicate]
                            
                                scatter plots with string arrays in matplotlib
                            
                                select multiple columns in pandas data frame with column index as sequential number

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best practices with reading and operating on fortran ordered arrays with numpy

Tags:

python

numpy

durden2.0

People also ask

1 Answers

A Simple Example

Reading and Writing

Common Mistakes with IO

Writing to Disk in Fortran-Order

Joe Kington

Recent Activity

Donate For Us