Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning field names to numpy array in Python 2.7.3

I am going nuts over this one, as I obviously miss the point and the solution is too simple to see :(

I have an np.array with x columns, and I want to assign a field name. So here is my code:

data = np.array([[1,2,3], [4.0,5.0,6.0], [11,12,12.3]])
a = np.array(data, dtype= {'names': ['1st', '2nd', '3rd'], 'formats':['f8','f8', 'f8']})
print a['1st']

why does this give

[[  1.    2.    3. ]
 [  4.    5.    6. ]
 [ 11.   12.   12.3]]

Instead of [1, 2, 3]?

like image 710
xtlc Avatar asked Oct 20 '22 03:10

xtlc


1 Answers

In [1]: data = np.array([[1,2,3], [4.0,5.0,6.0], [11,12,12.3]])
In [2]: dt = np.dtype({'names': ['1st', '2nd', '3rd'], 'formats':['f8','f8', 'f8']})

Your attempt:

In [3]: np.array(data,dt)
Out[3]: 
array([[(1.0, 1.0, 1.0), (2.0, 2.0, 2.0), (3.0, 3.0, 3.0)],
       [(4.0, 4.0, 4.0), (5.0, 5.0, 5.0), (6.0, 6.0, 6.0)],
       [(11.0, 11.0, 11.0), (12.0, 12.0, 12.0), (12.3, 12.3, 12.3)]], 
      dtype=[('1st', '<f8'), ('2nd', '<f8'), ('3rd', '<f8')])

produces a (3,3) array, with the same values assigned to each field. data.astype(dt) does the same thing.

But view produces a (3,1) array in which each field contains the data for a column.

In [4]: data.view(dt)
Out[4]: 
array([[(1.0, 2.0, 3.0)],
       [(4.0, 5.0, 6.0)],
       [(11.0, 12.0, 12.3)]], 
      dtype=[('1st', '<f8'), ('2nd', '<f8'), ('3rd', '<f8')])

I should caution that view only works if all the fields have the same data type as the original. It uses the same data buffer, just interpreting the values differently.

You could reshape the result from (3,1) to (3,).

But since you want A['1st'] to be [1,2,3] - a row of data - we have to do some other manipulation.

In [16]: data.T.copy().view(dt)
Out[16]: 
array([[(1.0, 4.0, 11.0)],
       [(2.0, 5.0, 12.0)],
       [(3.0, 6.0, 12.3)]], 
      dtype=[('1st', '<f8'), ('2nd', '<f8'), ('3rd', '<f8')])
In [17]: _['1st']
Out[17]: 
array([[ 1.],
       [ 2.],
       [ 3.]])

I transpose, and then make a copy (rearranging the underlying data buffer). Now a view puts [1,2,3] in one field.

Note that the display of the structured array uses () instead of [] for the 'rows'. This is clue as to how it accepts input.

I can turn your data into a list of tuples with:

In [19]: [tuple(i) for i in data.T]
Out[19]: [(1.0, 4.0, 11.0), (2.0, 5.0, 12.0), (3.0, 6.0, 12.300000000000001)]

In [20]: np.array([tuple(i) for i in data.T],dt)
Out[20]: 
array([(1.0, 4.0, 11.0), (2.0, 5.0, 12.0), (3.0, 6.0, 12.3)], 
      dtype=[('1st', '<f8'), ('2nd', '<f8'), ('3rd', '<f8')])
In [21]: _['1st']
Out[21]: array([ 1.,  2.,  3.])

This is a (3,) array with 3 fields.

A list of tuples is the normal way of supplying data to np.array(...,dt). See the doc link in my comment.

You can also create an empty array, and fill it, row by row, or field by field

In [26]: A=np.zeros((3,),dt)
In [27]: for i in range(3):
   ....:     A[i]=data[:,i].copy()

Without the copy I get a ValueError: ndarray is not C-contiguous

Fill field by field:

In [29]: for i in range(3):
   ....:     A[dt.names[i]]=data[i,:]

Usually a structured array has many rows, and a few fields. So filling by field is relatively fast. That's how recarray functions handle most copying tasks.


fromiter can also be used:

In [31]: np.fromiter(data, dtype=dt)
Out[31]: 
array([(1.0, 2.0, 3.0), (4.0, 5.0, 6.0), (11.0, 12.0, 12.3)], 
     dtype=[('1st', '<f8'), ('2nd', '<f8'), ('3rd', '<f8')])

But the error I get when using data.T without the copy is a strong indication that is doing the row by row iteration (my In[27])

In [32]: np.fromiter(data.T, dtype=dt)
  ValueError: ndarray is not C-contiguous

zip(*data) is another way of reordering the input array (see @unutbu's answer in the comment link).

np.fromiter(zip(*data),dtype=dt)

As pointed out in a comment, fromarrays works:

np.rec.fromarrays(data,dt)

This is an example of a rec function that uses the by field copy method:

arrayList = [sb.asarray(x) for x in arrayList]
....
_array = recarray(shape, descr)
# populate the record array (makes a copy)
for i in range(len(arrayList)):
    _array[_names[i]] = arrayList[i]

Which in our case is:

In [8]: data1 = [np.asarray(i) for i in data]
In [9]: data1
Out[9]: [array([ 1.,  2.,  3.]), array([ 4.,  5.,  6.]), array([ 11. ,  12. ,  12.3])]
In [10]: for i in range(3):
    A[dt.names[i]] = data1[i]
like image 100
hpaulj Avatar answered Nov 11 '22 15:11

hpaulj