Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add names to a numpy array without changing its dimension?

I have an existing two-column numpy array to which I need to add column names. Passing those in via dtype works in the toy example shown in Block 1 below. With my actual array, though, as shown in Block 2, the same approach is having an unexpected (to me!) side-effect of changing the array dimensions.

How can I convert my actual array, the one named Y in the second block below, to an array having named columns, like I did for array A in the first block?

Block 1: (Columns of A named without reshaping dimension)

import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
A
# array([[  1,   2],
#        [  3,   4],
#        [ 50, 100]])
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
A
# array([[(1, 2)],
#        [(3, 4)],
#        [(50, 100)]], 
#       dtype=[('ID', '<i4'), ('Ring', '<i4')])

Block 2: (Naming columns of my actual array, Y, reshapes its dimension)

import numpy as np
## Code to reproduce Y, the array I'm actually dealing with
RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])
Y = X.T
Y
# array([[1, 3],
#        [2, 2],
#        [3, 2],
#        [4, 1],
#        [5, 1],
#        [6, 1]])

## My unsuccessful attempt to add names to the array's columns    
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
Y.dtype=dt
Y
# array([[(1, 2), (3, 2)],
#        [(3, 4), (2, 1)],
#        [(5, 6), (1, 1)]], 
#       dtype=[('ID', '<i4'), ('Ring', '<i4')])

## What I'd like instead of the results shown just above
# array([[(1, 3)],
#        [(2, 2)],
#        [(3, 2)],
#        [(4, 1)],
#        [(5, 1)],
#        [(6, 1)]],
#       dtype=[('ID', '<i4'), ('Ring', '<i4')])
like image 359
Josh O'Brien Avatar asked Jun 11 '14 17:06

Josh O'Brien


4 Answers

store-different-datatypes-in-one-numpy-array another page including a nice solution of adding name to an array which can be used as column Example:

r = np.core.records.fromarrays([x1,x2,x3],names='a,b,c')
# x1, x2, x3 are flatten array
# a,b,c are field name
like image 147
lX-Xl Avatar answered Nov 10 '22 08:11

lX-Xl


First because your question asks about giving names to arrays, I feel obligated to point out that using "structured arrays" for the purpose of giving names is probably not the best approach. We often like to give names to rows/columns when we're working with tables, if this is the case I suggest you try something like pandas which is awesome. If you simply want to organize some data in your code, a dictionary of arrays is often much better than a structured array, so for example you can do:

Y = {'ID':X[0], 'Ring':X[1]}

With that out of the way, if you want to use a structured array, here is the clearest way to do it in my opinion:

import numpy as np

RING = [1,2,2,3,3,3]
ID = [1,2,3,4,5,6]
X = np.array([ID, RING])

dt = {'names':['ID', 'Ring'], 'formats':[int, int]}
Y = np.zeros(len(RING), dtype=dt)
Y['ID'] = X[0]
Y['Ring'] = X[1]
like image 25
Bi Rico Avatar answered Nov 10 '22 09:11

Bi Rico


This is because Y is not C_CONTIGUOUS, you can check it by Y.flags:

  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

You can call Y.copy() or Y.ravel() first:

dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
print Y.ravel().view(dt) # the result shape is (6, )
print Y.copy().view(dt)  # the result shape is (6, 1)
like image 3
HYRY Avatar answered Nov 10 '22 09:11

HYRY


Are you completely sure about the outputs for A and Y? I get something different using Python 2.7.6 and numpy 1.8.1.

My initial output for A is the same as yours, as it should be. After running the following code for the first example

dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt

the contents of array A are actually

array([[(1, 0), (3, 0)],
   [(2, 0), (2, 0)],
   [(3, 0), (2, 0)],
   [(4, 0), (1, 0)],
   [(5, 0), (1, 0)],
   [(6, 0), (1, 0)]], 
  dtype=[('ID', '<i4'), ('Ring', '<i4')])

This makes somewhat more sense to me than the output you added because dtype determines the data-type of every element in the array and the new definition states that every element should contain two fields, so it does, but the value of the second field is set to 0 because there was no preexisting value for the second field.

However, if you would like to make numpy group columns of your existing array so that every row contains only one element, but with each element having two fields, you could introduce a small code change.

Since a tuple is needed to make numpy group elements into a more complex data-type, you could make this happen by creating a new array and turning every row of the existing array into a tuple. Here is a simple working example

import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)

Using this short piece of code, array B becomes

array([(1, 2), (3, 4), (50, 100)], 
  dtype=[('ID', '<i4'), ('Ring', '<i4')])

To make B a 2D array, it is enough to write

B.reshape(len(B), 1) # in this case, even B.size would work instead of len(B)

For the second example, the similar thing needs to be done to make Y a structured array:

Y = np.array(list(map(tuple, X.T)), dtype=dt)

After doing this for your second example, array Y looks like this

array([(1, 3), (2, 2), (3, 2), (4, 1), (5, 1), (6, 1)], 
  dtype=[('ID', '<i4'), ('Ring', '<i4')])

You can notice that the output is not the same as the one you expect it to be, but this one is simpler because instead of writing Y[0,0] to get the first element, you can just write Y[0]. To also make this array 2D, you can also use reshape, just as with B.

like image 2
hgazibara Avatar answered Nov 10 '22 10:11

hgazibara