[An earlier version of this post had the inaccurate title "How to add one column to an array of numpy records?" The question asked in that earlier title has already been partially answered, but this answer is not quite what the body of that earlier version of this post was asking for. I've reworded the title, and edited the post substantially, to make the distinction clearer. I also explain why I the answer mentioned earlier falls short of what I'm looking for.]
Suppose I have two numpy
arrays x
and y
, each consisting of r "record" (aka "structured") arrays. Let the shape of x
be (r, cx) and the shape of y
be (r, cy). Let's also assume that there's no overlap between x.dtype.names
and y.dtype.names
.
For example, for r = 2, cx = 2, and cy = 1:
import numpy as np
x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])
I would like to "horizontally" concatenate x
and y
to produce a new array of records z
, having shape (r, cx + cy). This operation should not modify x
or y
at all.
In general, z = np.hstack((x, y))
won't do, because the dtype
's in x
and y
won't necessarily match. E.g., continuing the example above:
z = np.hstack((x, y))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-def477e6c8bf> in <module>()
----> 1 z = np.hstack((x, y))
TypeError: invalid type promotion
Now, there is a function, numpy.lib.recfunctions.append_fields
, that looks like it may do something close to what I'm looking for, but I have not been able to get anything out of it: everything I have tried with it either fails with an error, or produces something other than what I'm trying to get.
Can someone please show me explicitly the code (using n.l.r.append_fields
or otherwise1) that would generate, from the x
and y
defined in the example above, a new array of records, z
, equivalent to the horizontal concatenation of x
and y
, and do so without modifying either x
or y
?
I assume that this will require only one or two lines of code. Of course, I am looking for code that does not require building z
, record by record, by iterating over x
and y
. Also, the code may assume that x
and y
have the same number of records, and that there is no overlap between x.dtype.names
and y.dtype.names
. Other than this, the code I'm looking for should know nothing about x
and y
. Ideally, it should be agnostic also about the number of arrays to join. IOW, leaving out error checking, the code I'm looking for could be the body of a function hstack_rec
so that the new array z
would be the result hstack_rec((x, y))
.
1...although I have to admit that, after my so-far perfect record of failure with numpy.lib.recfunctions.append_fields
, I've become a bit curious about how this function could be used at all, irrespective of its relevance to this post's question.
I never use recarrays, and so someone else is going to come up with something slicker, but maybe merge_arrays
would work?
>>> import numpy.lib.recfunctions as nlr
>>> x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
>>> y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])
>>> x
array([(1, 3.0), (2, 4.0)],
dtype=[('i', '<i4'), ('f', '<f4')])
>>> y
array([('a',), ('b',)],
dtype=[('s', '|S10')])
>>> z = nlr.merge_arrays([x, y], flatten=True)
>>> z
array([(1, 3.0, 'a'), (2, 4.0, 'b')],
dtype=[('i', '<i4'), ('f', '<f4'), ('s', '|S10')])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With