Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy array row major and column major

I'm having trouble understanding how numpy stores its data. Consider the following:

>>> import numpy as np
>>> a = np.ndarray(shape=(2,3), order='F')
>>> for i in xrange(6): a.itemset(i, i+1)
... 
>>> a
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])
>>> a.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

This says that a is column major (F_CONTIGUOUS) thus, internally, a should look like the following:

[1, 4, 2, 5, 3, 6]

This is just what it is stated in in this glossary. What is confusing me is that if I try to to access the data of a in a linear fashion instead I get:

>>> for i in xrange(6): print a.item(i)
... 
1.0
2.0
3.0
4.0
5.0
6.0

At this point I'm not sure what the F_CONTIGUOUS flag tells us since it does not honor the ordering. Apparently everything in python is row major and when we want to iterate in a linear fashion we can use the iterator flat.

The question is the following: given that we have a list of numbers, say: 1, 2, 3, 4, 5, 6, how can we create a numpy array of shape (2, 3) in column major order? That is how can I get a matrix that looks like this

array([[ 1.,  3.,  5.],
       [ 2.,  4.,  6.]])

I would really like to be able to iterate linearly over the list and place them into the newly created ndarray. The reason for this is because I will be reading files of multidimensional arrays set in column major order.

like image 757
jmlopez Avatar asked Dec 03 '13 02:12

jmlopez


People also ask

Are NumPy arrays row-major or column major?

NumPy creates arrays in row-major order by default.

Are NumPy arrays row column or column row?

Data in NumPy arrays can be accessed directly via column and row indexes, and this is reasonably straightforward.

What is row-major and column major in array?

The elements of an array can be stored in column-major layout or row-major layout. For an array stored in column-major layout, the elements of the columns are contiguous in memory. In row-major layout, the elements of the rows are contiguous. Array layout is also called order, format, and representation.

Is NumPy matrix row-major?

The Python NumPy library is very general. It can use either row-major or column-major ordered arrays, but it defaults to row-major ordering. NumPy also supports sophisticated views of data with custom strides across non-contiguous regions of memory.


2 Answers

The numpy stores data in row major order.

>>> a = np.array([[1,2,3,4], [5,6,7,8]])
>>> a.shape
(2, 4)
>>> a.shape = 4,2
>>> a
array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

If you change the shape, the order of data do not change.

If you add a 'F', you can get what you want.

>>> b
array([1, 2, 3, 4, 5, 6])
>>> c = b.reshape(2,3,order='F')
>>> c
array([[1, 3, 5],
       [2, 4, 6]])
like image 88
Kill Console Avatar answered Oct 07 '22 08:10

Kill Console


Your question has been answered, but I thought I would add this to explain your observations regarding, "At this point I'm not sure what the F_CONTIGUOUS flag tells us since it does not honor the ordering."


The item method doesn't directly access the data like you think it does. To do this, you should access the data attribute, which gives you the byte string.

An example:

c = np.array([[1,2,3],
              [4,6,7]], order='C')

f = np.array([[1,2,3],
              [4,6,7]], order='F')

Observe

print c.flags.c_contiguous, f.flags.f_contiguous
# True, True

and

print c.nbytes == len(c.data)
# True

Now let's print the contiguous data for both:

nelements = np.prod(c.shape)
bsize = c.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = c.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=c.dtype)[0], # Convert to number.

This prints:

1 2 3 4 6 7

which is what we expect since c is order 'C', i.e., its data is stored row-major contiguous.

On the other hand,

nelements = np.prod(f.shape)
bsize = f.dtype.itemsize # should be 8 bytes for 'int64'
for i in range(nelements):
    bnum = f.data[i*bsize : (i+1)*bsize] # The element as a byte string.
    print np.fromstring(bnum, dtype=f.dtype)[0], # Convert to number.

prints

1 4 2 6 3 7

which, again, is what we expect to see since f's data is stored column-major contiguous.

like image 26
Matt Hancock Avatar answered Oct 07 '22 07:10

Matt Hancock