Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to hstack arrays of numpy records?

[An earlier version of this post had the inaccurate title "How to add one column to an array of numpy records?" The question asked in that earlier title has already been partially answered, but this answer is not quite what the body of that earlier version of this post was asking for. I've reworded the title, and edited the post substantially, to make the distinction clearer. I also explain why I the answer mentioned earlier falls short of what I'm looking for.]


Suppose I have two numpy arrays x and y, each consisting of r "record" (aka "structured") arrays. Let the shape of x be (r, cx) and the shape of y be (r, cy). Let's also assume that there's no overlap between x.dtype.names and y.dtype.names.

For example, for r = 2, cx = 2, and cy = 1:

import numpy as np
x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])

I would like to "horizontally" concatenate x and y to produce a new array of records z, having shape (r, cx + cy). This operation should not modify x or y at all.

In general, z = np.hstack((x, y)) won't do, because the dtype's in x and y won't necessarily match. E.g., continuing the example above:

z = np.hstack((x, y))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-def477e6c8bf> in <module>()
----> 1 z = np.hstack((x, y))
TypeError: invalid type promotion


Now, there is a function, numpy.lib.recfunctions.append_fields, that looks like it may do something close to what I'm looking for, but I have not been able to get anything out of it: everything I have tried with it either fails with an error, or produces something other than what I'm trying to get.

Can someone please show me explicitly the code (using n.l.r.append_fields or otherwise1) that would generate, from the x and y defined in the example above, a new array of records, z, equivalent to the horizontal concatenation of x and y, and do so without modifying either x or y?

I assume that this will require only one or two lines of code. Of course, I am looking for code that does not require building z, record by record, by iterating over x and y. Also, the code may assume that x and y have the same number of records, and that there is no overlap between x.dtype.names and y.dtype.names. Other than this, the code I'm looking for should know nothing about x and y. Ideally, it should be agnostic also about the number of arrays to join. IOW, leaving out error checking, the code I'm looking for could be the body of a function hstack_rec so that the new array z would be the result hstack_rec((x, y)).


1...although I have to admit that, after my so-far perfect record of failure with numpy.lib.recfunctions.append_fields, I've become a bit curious about how this function could be used at all, irrespective of its relevance to this post's question.

like image 948
kjo Avatar asked Feb 18 '13 19:02

kjo


1 Answers

I never use recarrays, and so someone else is going to come up with something slicker, but maybe merge_arrays would work?

>>> import numpy.lib.recfunctions as nlr
>>> x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
>>> y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])
>>> x
array([(1, 3.0), (2, 4.0)], 
      dtype=[('i', '<i4'), ('f', '<f4')])
>>> y
array([('a',), ('b',)], 
      dtype=[('s', '|S10')])
>>> z = nlr.merge_arrays([x, y], flatten=True)
>>> z
array([(1, 3.0, 'a'), (2, 4.0, 'b')], 
      dtype=[('i', '<i4'), ('f', '<f4'), ('s', '|S10')])
like image 180
DSM Avatar answered Nov 01 '22 01:11

DSM