Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy array to ctypes with FORTRAN ordering

Is there a performant way to convert a numpy array to a FORTRAN ordered ctypes array, ideally without a copy being required, and without triggering problems related to strides?

import numpy as np

# Sample data
n = 10000
A = np.zeros((n,n), dtype=np.int8)
A[0,1] = 1

def slow_conversion(A):
    return np.ctypeslib.as_ctypes(np.ascontiguousarray(A.T))

assert slow_conversion(A)[1][0] == 1

Performance analysis of just the call to as_ctypes:

%%timeit
np.ctypeslib.as_ctypes(A)

3.35 µs ± 10.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Performance analysis of the supplied (slow) conversion

%%timeit
slow_conversion(A)

206 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The ideal outcome would be to get performance similar to that of just the as_ctypes call.

like image 801
Andrew Walker Avatar asked May 18 '26 12:05

Andrew Walker


1 Answers

By default, Numpy creates C-ordered arrays (because it is written in C), that is, row-major arrays. Doing a transposition with A.T creates a view of the array where the strides are reversed (ie. no copy). That being said, np.ascontiguousarray does a copy because the array is now not contiguous anymore and copies are expensive. This is why slow_conversion is slow. Note that the contiguity can be tested with yourarray.flags['F_CONTIGUOUS'] and by checking yourarray.strides. Note also that yourarray.flags and yourarray.__array_interface__ provide information about whether the array has been copied and also information about strides.

np.asfortranarray returns an array laid out in Fortran order in memory regarding the documentation. It can perform a copy if needed. In fact, np.asfortranarray(A) does a copy while np.asfortranarray(A.T) does not. You can check the C code of the function for more information about this behaviour. Since both are seen as FORTRAN-contiguous, it is better to use np.asfortranarray(A.T) that do not make any copy in this case.

Regarding ctypes, it deals with C arrays which are stored with a row-major ordering as opposed to FORTRAN array that are stored with a column-major ordering. Furthermore, C arrays do not support strides natively as opposed to FORTRAN ones. This means a row is basically a memory view of contiguous data stored in memory. Since slow_conversion(A)[1][0] == 1 is required to be true, this means the returned object should have the first item of the second row is 1 and thus that the columns are mandatory stored contiguously in memory. The thing is that the initial array is not FORTRAN-contiguous but C-contiguous and thus a transposition is required. Transpositions are pretty expensive (although the Numpy implementation is suboptimal).

Assuming you do not want to pay the overhead of a copy/transposition, the problem need to be relaxed. There are few possible options:

  • Create FORTRAN ordered array directly with Numpy using for example np.zeros((n,n), dtype=np.int8, order='F'). This create a C array with transposed strides so to behave like a FORTRAN array where computations operating on columns are fast (remember that Numpy is written in C so row-major ordered array are the reference). With this, the first row in ctypes is actually a column. Note that you should be very careful when mixing C and FORTRAN ordered array for sake of performance as non-contiguous access are much slower.
  • Use a strided FORTRAN array. This solution basically means that basic column-based computations will be much slower and one need to write row-based computation that are quite unusual in FORTRAN. You can extract the pointer to the C-contiguous array with A.ctypes.data_as(POINTER(c_double)), the strides with A.strides and the shape with A.shape. That being said, this solution appears not to be really portable/standard. The standard way appears to use C binding in FORTRAN. I am not very familiar with this but you can find a complete example in this answer.

A last solution is to transpose data manually using a fast transposition algorithm. An in-place transposition is faster than an out-of-place transposition (typically 2 times) but this requires a squared array and it mutates the input array that should not be used later (unless it is fine to operate on a transposed array). A fast transposition cannot be written using Numpy directly. One solution is to do in in Numba (this is a sub-optimal out-of-place solution but pretty fast already), or to do it in C or directly in FORTRAN (using a wrapper function in all cases). This should be significantly faster than what Numpy does but still far slower than a basic ctypes wrapping.

like image 167
Jérôme Richard Avatar answered May 20 '26 00:05

Jérôme Richard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!