I'm new to working with numpy arrays and I'm having trouble creating a structured array. I'd like to create something similar to a Matlab structure where the fields can be arrays of different shapes.
a=numpy.array([1, 2, 3, 4, 5, 6,]);
b=numpy.array([7,8,9]);
c=numpy.array([10,11,12,13,14,15,16,17,18,19,20]);
##Doesn't do what I want
data=numpy.array([a, b, c],dtype=[('a','f8'),('b','f8'),('c','f8')]);
I'd like data['a']
to return matrix a, data['b']
to return matrix b, etc. When reading in a Matlab structure, the data is saved in this format so I know it must be possible.
Having a data type (dtype) is one of the key features that distinguishes NumPy arrays from lists. In lists, the types of elements can be mixed. One index of a list can contain an integer, another can contain a string.
I'm afraid it's not possible without twisting NumPy's arm a lot.
See, the idea behind NumPy is to provide homogeneous arrays, that is, arrays of elements that all have the same type. This type can be simple (int
, float
...) or more complicated ([('',int),('',float),('',"|S10")])
, but in any case, all the elements have the same type. That permits some very efficient memory layout.
So, inherently, a structured array requires the fields (the individual subblocks) to have the same size no matter the position. Examine the following:
>>> np.zeros(3,dtype=[('a',(int,3)),('b',(float,5))])
It defines an array with three elements; each element is composed of two sub-blocks, a
and b
; a
is a block of three ints
, b
a block of five floats
. But once you define the initial size of the blocks in the dtype
, you're stuck with that (well, you can always switch, but that's another story).
There's a workaround: using a dtype=object
. That way, you're constructing an array of heterogeneous items, like an array of lists of different sizes. But you lose a lot of NumPy power that way. Still, an example:
>>> x=np.zeros(3, dtype=[('a',object), ('b',object)])
>>> x['a'][0] = [1,2,3,4]
>>> x['b'][-1] = "ABCDEF"
>>> print x
[([1, 2, 3, 4], 0) (0, 0) (0, 'ABCD')]
So, we just constructed an array of... objects. I put a list somewhere, a string elsewhere, and it works. You could follow the same example to build an array like you want:
blob = np.array([(a,b,c)],dtype=[('a',object),('b',object),('c',object)])
but then, you should really think twice whether it's really a mean to your end, another structure would probably be more efficient.
A side note: please pay attention to the [(a,b,c)]
part of the expression above: notice the ()
? You're basically telling NumPy to construct an array of 1 element, composed of three sub-elements (one for each of your a,b,c
), each sub-element being an object. If you don't put the ()
, NumPy will whine a lot.
And a last comment: if you access your fields like blob['a']
, you'll get an array of size (1,)
and dtype=object
: just use blob['a'].item()
to get back your original (6,)
int
array.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With