Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create numpy structured array with multiple fields of different shape?

I'm new to working with numpy arrays and I'm having trouble creating a structured array. I'd like to create something similar to a Matlab structure where the fields can be arrays of different shapes.

a=numpy.array([1, 2, 3, 4, 5, 6,]);
b=numpy.array([7,8,9]);
c=numpy.array([10,11,12,13,14,15,16,17,18,19,20]);

##Doesn't do what I want
data=numpy.array([a, b, c],dtype=[('a','f8'),('b','f8'),('c','f8')]);  

I'd like data['a'] to return matrix a, data['b'] to return matrix b, etc. When reading in a Matlab structure, the data is saved in this format so I know it must be possible.

like image 658
brad14 Avatar asked Sep 11 '12 20:09

brad14


People also ask

Can NumPy array have mixed types?

Having a data type (dtype) is one of the key features that distinguishes NumPy arrays from lists. In lists, the types of elements can be mixed. One index of a list can contain an integer, another can contain a string.


1 Answers

I'm afraid it's not possible without twisting NumPy's arm a lot.

See, the idea behind NumPy is to provide homogeneous arrays, that is, arrays of elements that all have the same type. This type can be simple (int, float...) or more complicated ([('',int),('',float),('',"|S10")]), but in any case, all the elements have the same type. That permits some very efficient memory layout.

So, inherently, a structured array requires the fields (the individual subblocks) to have the same size no matter the position. Examine the following:

>>> np.zeros(3,dtype=[('a',(int,3)),('b',(float,5))])

It defines an array with three elements; each element is composed of two sub-blocks, a and b; a is a block of three ints, b a block of five floats. But once you define the initial size of the blocks in the dtype, you're stuck with that (well, you can always switch, but that's another story).

There's a workaround: using a dtype=object. That way, you're constructing an array of heterogeneous items, like an array of lists of different sizes. But you lose a lot of NumPy power that way. Still, an example:

>>> x=np.zeros(3, dtype=[('a',object), ('b',object)])
>>> x['a'][0] = [1,2,3,4]
>>> x['b'][-1] = "ABCDEF"
>>> print x
[([1, 2, 3, 4], 0) (0, 0) (0, 'ABCD')]

So, we just constructed an array of... objects. I put a list somewhere, a string elsewhere, and it works. You could follow the same example to build an array like you want:

blob = np.array([(a,b,c)],dtype=[('a',object),('b',object),('c',object)])

but then, you should really think twice whether it's really a mean to your end, another structure would probably be more efficient.

A side note: please pay attention to the [(a,b,c)] part of the expression above: notice the ()? You're basically telling NumPy to construct an array of 1 element, composed of three sub-elements (one for each of your a,b,c), each sub-element being an object. If you don't put the (), NumPy will whine a lot.

And a last comment: if you access your fields like blob['a'], you'll get an array of size (1,) and dtype=object: just use blob['a'].item() to get back your original (6,) int array.

like image 81
Pierre GM Avatar answered Oct 21 '22 04:10

Pierre GM