I have a numpy structured array with a mixed dtype (i.e., floats, ints, and strings). I want to select some of the columns of the array (all of which contain only floats) and then get the sum, by column, of the rows, as a standard numpy array. The initial array takes a form comparable to:
some_data = np.array([('foo', 3.5, 2.15), ('bar', 2.8, 5.3), ('baz', 1.2, 3.7)],
dtype=[('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])
For this example, I'd like to take the sum of columns A and B, yielding np.array([7.5, 11.15])
. With numpy ≤1.13, I could do that as follows:
get_cols = ['A', 'B']
desired_sum = np.sum(some_data[get_cols].view(('<f8', len(get_cols))), axis=0)
With the release of numpy 1.14, this method now fails with ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged
, which is a result of the changes made in numpy 1.14 to the handling of structured arrays. (User bbengfort commented about the FutureWarning given about this change in this answer.)
In light of these changes to structured arrays, how can I obtain the desired sum from the structured array subset?
In [165]: some_data = np.array([('foo', 3.5, 2.15), ('bar', 2.8, 5.3), ('baz', 1.2, 3.7)], dtype=[('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])
...:
In [166]: get_cols = ['A','B']
In [167]: some_data[get_cols]
Out[167]:
array([( 3.5, 2.15), ( 2.8, 5.3 ), ( 1.2, 3.7 )],
dtype=[('A', '<f8'), ('B', '<f8')])
Simply reading the field values is fine. In 1.13 we get a warning
In [168]: some_data[get_cols].view(('<f8', len(get_cols)))
/usr/local/bin/ipython3:1: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array.
This code may break in numpy 1.13 because this will return a view instead of a copy -- see release notes for details.
#!/usr/bin/python3
Out[168]:
array([[ 3.5 , 2.15],
[ 2.8 , 5.3 ],
[ 1.2 , 3.7 ]])
With the recommended copy, no warning:
In [169]: some_data[get_cols].copy().view(('<f8', len(get_cols)))
Out[169]:
array([[ 3.5 , 2.15],
[ 2.8 , 5.3 ],
[ 1.2 , 3.7 ]])
In [171]: np.sum(_, axis=0)
Out[171]: array([ 7.5 , 11.15])
In your original array,
dtype([('col1', '<U20'), ('A', '<f8'), ('B', '<f8')])
An A,B
slice would have the two f8
items interspersed with the 20U items. Changing the view
dtype of such a mix is problematic. That's why working with a copy is more reliable.
Since U20
takes up 4*20 bytes, the total itemsize
is 96, a multiple of 8. We can convert the whole thing to f8
, reshape and 'throw-away' the U20
columns:
In [183]: some_data.view('f8').reshape(3,-1)[:,-2:]
Out[183]:
array([[ 3.5 , 2.15],
[ 2.8 , 5.3 ],
[ 1.2 , 3.7 ]])
It's not very pretty and I don't recommend it, but it may give some insight into how structured data is arranged.
view
on a structured array is useful at times, but often a bit tricky to use correctly.
If the 2 numeric fields are usually used together, I'd recommend a compound dtype like:
In [184]: some_data = np.array([('foo', [3.5, 2.15]), ('bar', [2.8, 5.3]), ('baz
...: ', [1.2, 3.7])],
...: dtype=[('col1', '<U20'), ('AB', '<f8',(2,))])
...:
...:
In [185]: some_data
Out[185]:
array([('foo', [ 3.5 , 2.15]), ('bar', [ 2.8 , 5.3 ]),
('baz', [ 1.2 , 3.7 ])],
dtype=[('col1', '<U20'), ('AB', '<f8', (2,))])
In [186]: some_data['AB']
Out[186]:
array([[ 3.5 , 2.15],
[ 2.8 , 5.3 ],
[ 1.2 , 3.7 ]])
genfromtxt
accepts this style of dtype
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With