I have two matricies, corresponding to data points (x,y1) and (x,y2):
   x  |  y1
------------
   0  |  0
   1  |  1
   2  |  2
   3  |  3
   4  |  4
   5  |  5
    x   |  y2
----------------
   0.5  |  0.5
   1.5  |  1.5
   2.5  |  2.5
   3.5  |  3.5
   4.5  |  4.5
   5.5  |  5.5
I'd like to create a new matrix that combines the x values into a single column, and has NaNs in the appropriate y1, y2 columns:
    x    |    y1    |   y2
-----------------------------
    0    |     0    |  NaN
    0.5  |    NaN   |  0.5
    1    |     0    |  NaN
    1.5  |    NaN   |  1.5
    ...  |    ...   |  ...
    5    |     5    |  NaN
    5.5  |    NaN   |  5.5 
Is there an easy way to do this? I'm new to Python and NumPy (coming from MATLAB) and I'm not sure how I would even begin with this. (For reference, my approach to this in MATLAB is simply using an outerjoin against two tables that are generated with array2table.)
Ahh your array_2 only has one dimension, needs to have same number of dimensions with your array_1 . You can either reshape it array_2. reshape(-1,1) , or add a new axis array_2[:,np. newaxis] to make it 2 dimensional before concatenation.
You can use the numpy. concatenate() function to concat, merge, or join a sequence of two or multiple arrays into a single NumPy array. Concatenation refers to putting the contents of two or more arrays in a single array.
We can perform the concatenation operation using the concatenate() function. With this function, arrays are concatenated either row-wise or column-wise, given that they have equal rows or columns respectively. Column-wise concatenation can be done by equating axis to 1 as an argument in the function.
To convert an array to a dataframe with Python you need to 1) have your NumPy array (e.g., np_array), and 2) use the pd. DataFrame() constructor like this: df = pd. DataFrame(np_array, columns=['Column1', 'Column2']) . Remember, that each column in your NumPy array needs to be named with columns.
A structured array approach (incomplete):
Input a special library of recfunctions:
In [441]: import numpy.lib.recfunctions as rf
Define two structured arrays
In [442]: A = np.zeros((6,),[('x',int),('y',int)])
Oops, the 'xkeys inBare float, so for consistency, let's make theA` ones float as well.  Don't mix floats and ints unnecessarily.
In [446]: A = np.zeros((6,),[('x',float),('y',int)])
In [447]: A['x']=np.arange(6)
In [448]: A['y']=np.arange(6)
In [449]: A
Out[449]: 
array([( 0., 0), ( 1., 1), ( 2., 2), ( 3., 3), ( 4., 4), ( 5., 5)],
      dtype=[('x', '<f8'), ('y', '<i4')])
In [450]: B = np.zeros((6,),[('x',float),('z',float)])
In [451]: B['x']=np.linspace(.5,5.5,6)
In [452]: B['z']=np.linspace(.5,5.5,6)
In [453]: B
Out[453]: 
array([( 0.5,  0.5), ( 1.5,  1.5), ( 2.5,  2.5), ( 3.5,  3.5),
       ( 4.5,  4.5), ( 5.5,  5.5)],
      dtype=[('x', '<f8'), ('z', '<f8')])
Look at the docs of the rf.join_by function:
In [454]: rf.join_by?
Do an outer join:
In [457]: rf.join_by('x',A,B,'outer')
Out[457]: 
masked_array(data = [(0.0, 0, --) (0.5, --, 0.5) (1.0, 1, --) (1.5, --, 1.5) (2.0, 2, --)
 (2.5, --, 2.5) (3.0, 3, --) (3.5, --, 3.5) (4.0, 4, --) (4.5, --, 4.5)
 (5.0, 5, --) (5.5, --, 5.5)],
             mask = [(False, False,  True) (False,  True, False) (False, False,  True)
 (False,  True, False) (False, False,  True) (False,  True, False)
 (False, False,  True) (False,  True, False) (False, False,  True)
 (False,  True, False) (False, False,  True) (False,  True, False)],
       fill_value = (  1.00000000e+20, 999999,   1.00000000e+20),
            dtype = [('x', '<f8'), ('y', '<i4'), ('z', '<f8')])
The result is a masked array, with the missing values masked.
Same thing, but with masking turned off:
In [460]: rf.join_by('x',A,B,'outer',usemask=False)
Out[460]: 
array([( 0. ,      0,   1.00000000e+20), ( 0.5, 999999,   5.00000000e-01),
       ( 1. ,      1,   1.00000000e+20), ( 1.5, 999999,   1.50000000e+00),
       ( 2. ,      2,   1.00000000e+20), ( 2.5, 999999,   2.50000000e+00),
       ( 3. ,      3,   1.00000000e+20), ( 3.5, 999999,   3.50000000e+00),
       ( 4. ,      4,   1.00000000e+20), ( 4.5, 999999,   4.50000000e+00),
       ( 5. ,      5,   1.00000000e+20), ( 5.5, 999999,   5.50000000e+00)],
      dtype=[('x', '<f8'), ('y', '<i4'), ('z', '<f8')])
Now we see the fill values explicitly.  There must be a way of replacing the 1e20 with np.nan.  Replacing 999999 with nan is messier, since np.nan is a float value, not integer.
Under the cover this join_by is probably first creating a blank array with the join dtype, and filling in fields one by one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With