Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Merging" numpy arrays together with a common dimension [duplicate]

I have two matricies, corresponding to data points (x,y1) and (x,y2):

   x  |  y1
------------
   0  |  0
   1  |  1
   2  |  2
   3  |  3
   4  |  4
   5  |  5

    x   |  y2
----------------
   0.5  |  0.5
   1.5  |  1.5
   2.5  |  2.5
   3.5  |  3.5
   4.5  |  4.5
   5.5  |  5.5

I'd like to create a new matrix that combines the x values into a single column, and has NaNs in the appropriate y1, y2 columns:

    x    |    y1    |   y2
-----------------------------
    0    |     0    |  NaN
    0.5  |    NaN   |  0.5
    1    |     0    |  NaN
    1.5  |    NaN   |  1.5
    ...  |    ...   |  ...
    5    |     5    |  NaN
    5.5  |    NaN   |  5.5 

Is there an easy way to do this? I'm new to Python and NumPy (coming from MATLAB) and I'm not sure how I would even begin with this. (For reference, my approach to this in MATLAB is simply using an outerjoin against two tables that are generated with array2table.)

like image 983
Dang Khoa Avatar asked Nov 18 '17 19:11

Dang Khoa


People also ask

Can we concatenate the arrays with different dimensions?

Ahh your array_2 only has one dimension, needs to have same number of dimensions with your array_1 . You can either reshape it array_2. reshape(-1,1) , or add a new axis array_2[:,np. newaxis] to make it 2 dimensional before concatenation.

How do I combine multiple NumPy arrays?

You can use the numpy. concatenate() function to concat, merge, or join a sequence of two or multiple arrays into a single NumPy array. Concatenation refers to putting the contents of two or more arrays in a single array.

Can you combine NumPy arrays?

We can perform the concatenation operation using the concatenate() function. With this function, arrays are concatenated either row-wise or column-wise, given that they have equal rows or columns respectively. Column-wise concatenation can be done by equating axis to 1 as an argument in the function.

How do I merge two NumPy arrays into a DataFrame?

To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.


1 Answers

A structured array approach (incomplete):

Input a special library of recfunctions:

In [441]: import numpy.lib.recfunctions as rf

Define two structured arrays

In [442]: A = np.zeros((6,),[('x',int),('y',int)])

Oops, the 'xkeys inBare float, so for consistency, let's make theA` ones float as well. Don't mix floats and ints unnecessarily.

In [446]: A = np.zeros((6,),[('x',float),('y',int)])
In [447]: A['x']=np.arange(6)
In [448]: A['y']=np.arange(6)
In [449]: A
Out[449]: 
array([( 0., 0), ( 1., 1), ( 2., 2), ( 3., 3), ( 4., 4), ( 5., 5)],
      dtype=[('x', '<f8'), ('y', '<i4')])

In [450]: B = np.zeros((6,),[('x',float),('z',float)])
In [451]: B['x']=np.linspace(.5,5.5,6)
In [452]: B['z']=np.linspace(.5,5.5,6)
In [453]: B
Out[453]: 
array([( 0.5,  0.5), ( 1.5,  1.5), ( 2.5,  2.5), ( 3.5,  3.5),
       ( 4.5,  4.5), ( 5.5,  5.5)],
      dtype=[('x', '<f8'), ('z', '<f8')])

Look at the docs of the rf.join_by function:

In [454]: rf.join_by?

Do an outer join:

In [457]: rf.join_by('x',A,B,'outer')
Out[457]: 
masked_array(data = [(0.0, 0, --) (0.5, --, 0.5) (1.0, 1, --) (1.5, --, 1.5) (2.0, 2, --)
 (2.5, --, 2.5) (3.0, 3, --) (3.5, --, 3.5) (4.0, 4, --) (4.5, --, 4.5)
 (5.0, 5, --) (5.5, --, 5.5)],
             mask = [(False, False,  True) (False,  True, False) (False, False,  True)
 (False,  True, False) (False, False,  True) (False,  True, False)
 (False, False,  True) (False,  True, False) (False, False,  True)
 (False,  True, False) (False, False,  True) (False,  True, False)],
       fill_value = (  1.00000000e+20, 999999,   1.00000000e+20),
            dtype = [('x', '<f8'), ('y', '<i4'), ('z', '<f8')])

The result is a masked array, with the missing values masked.

Same thing, but with masking turned off:

In [460]: rf.join_by('x',A,B,'outer',usemask=False)
Out[460]: 
array([( 0. ,      0,   1.00000000e+20), ( 0.5, 999999,   5.00000000e-01),
       ( 1. ,      1,   1.00000000e+20), ( 1.5, 999999,   1.50000000e+00),
       ( 2. ,      2,   1.00000000e+20), ( 2.5, 999999,   2.50000000e+00),
       ( 3. ,      3,   1.00000000e+20), ( 3.5, 999999,   3.50000000e+00),
       ( 4. ,      4,   1.00000000e+20), ( 4.5, 999999,   4.50000000e+00),
       ( 5. ,      5,   1.00000000e+20), ( 5.5, 999999,   5.50000000e+00)],
      dtype=[('x', '<f8'), ('y', '<i4'), ('z', '<f8')])

Now we see the fill values explicitly. There must be a way of replacing the 1e20 with np.nan. Replacing 999999 with nan is messier, since np.nan is a float value, not integer.

Under the cover this join_by is probably first creating a blank array with the join dtype, and filling in fields one by one.

like image 94
hpaulj Avatar answered Oct 16 '22 02:10

hpaulj