"Merging" numpy arrays together with a common dimension [duplicate]

Tags:

I have two matricies, corresponding to data points (x,y1) and (x,y2):

   x  |  y1
------------
   0  |  0
   1  |  1
   2  |  2
   3  |  3
   4  |  4
   5  |  5

    x   |  y2
----------------
   0.5  |  0.5
   1.5  |  1.5
   2.5  |  2.5
   3.5  |  3.5
   4.5  |  4.5
   5.5  |  5.5

I'd like to create a new matrix that combines the x values into a single column, and has NaNs in the appropriate y1, y2 columns:

    x    |    y1    |   y2
-----------------------------
    0    |     0    |  NaN
    0.5  |    NaN   |  0.5
    1    |     0    |  NaN
    1.5  |    NaN   |  1.5
    ...  |    ...   |  ...
    5    |     5    |  NaN
    5.5  |    NaN   |  5.5

Is there an easy way to do this? I'm new to Python and NumPy (coming from MATLAB) and I'm not sure how I would even begin with this. (For reference, my approach to this in MATLAB is simply using an outerjoin against two tables that are generated with array2table.)

983

asked Nov 18 '17 19:11

Dang Khoa

1 Answers

A structured array approach (incomplete):

Input a special library of recfunctions:

In [441]: import numpy.lib.recfunctions as rf

Define two structured arrays

In [442]: A = np.zeros((6,),[('x',int),('y',int)])

Oops, the 'xkeys inBare float, so for consistency, let's make theA` ones float as well. Don't mix floats and ints unnecessarily.

In [446]: A = np.zeros((6,),[('x',float),('y',int)])
In [447]: A['x']=np.arange(6)
In [448]: A['y']=np.arange(6)
In [449]: A
Out[449]: 
array([( 0., 0), ( 1., 1), ( 2., 2), ( 3., 3), ( 4., 4), ( 5., 5)],
      dtype=[('x', '<f8'), ('y', '<i4')])

In [450]: B = np.zeros((6,),[('x',float),('z',float)])
In [451]: B['x']=np.linspace(.5,5.5,6)
In [452]: B['z']=np.linspace(.5,5.5,6)
In [453]: B
Out[453]: 
array([( 0.5,  0.5), ( 1.5,  1.5), ( 2.5,  2.5), ( 3.5,  3.5),
       ( 4.5,  4.5), ( 5.5,  5.5)],
      dtype=[('x', '<f8'), ('z', '<f8')])

Look at the docs of the rf.join_by function:

In [454]: rf.join_by?

Do an outer join:

In [457]: rf.join_by('x',A,B,'outer')
Out[457]: 
masked_array(data = [(0.0, 0, --) (0.5, --, 0.5) (1.0, 1, --) (1.5, --, 1.5) (2.0, 2, --)
 (2.5, --, 2.5) (3.0, 3, --) (3.5, --, 3.5) (4.0, 4, --) (4.5, --, 4.5)
 (5.0, 5, --) (5.5, --, 5.5)],
             mask = [(False, False,  True) (False,  True, False) (False, False,  True)
 (False,  True, False) (False, False,  True) (False,  True, False)
 (False, False,  True) (False,  True, False) (False, False,  True)
 (False,  True, False) (False, False,  True) (False,  True, False)],
       fill_value = (  1.00000000e+20, 999999,   1.00000000e+20),
            dtype = [('x', '<f8'), ('y', '<i4'), ('z', '<f8')])

The result is a masked array, with the missing values masked.

Same thing, but with masking turned off:

In [460]: rf.join_by('x',A,B,'outer',usemask=False)
Out[460]: 
array([( 0. ,      0,   1.00000000e+20), ( 0.5, 999999,   5.00000000e-01),
       ( 1. ,      1,   1.00000000e+20), ( 1.5, 999999,   1.50000000e+00),
       ( 2. ,      2,   1.00000000e+20), ( 2.5, 999999,   2.50000000e+00),
       ( 3. ,      3,   1.00000000e+20), ( 3.5, 999999,   3.50000000e+00),
       ( 4. ,      4,   1.00000000e+20), ( 4.5, 999999,   4.50000000e+00),
       ( 5. ,      5,   1.00000000e+20), ( 5.5, 999999,   5.50000000e+00)],
      dtype=[('x', '<f8'), ('y', '<i4'), ('z', '<f8')])

Now we see the fill values explicitly. There must be a way of replacing the 1e20 with np.nan. Replacing 999999 with nan is messier, since np.nan is a float value, not integer.

Under the cover this join_by is probably first creating a blank array with the join dtype, and filling in fields one by one.

answered Oct 16 '22 02:10

hpaulj

Related questions
                            
                                Return a variable vs return a function call
                            
                                How to replace all non-numeric entries with NaN in a pandas dataframe?
                            
                                tar: Unrecognized archive format error when trying to unpack flower_photos.tgz, TF tutorials on OSX
                            
                                Redis keyspace notifications - get both key and value change
                            
                                How to create a 2d list from a input data?
                            
                                Oct2Py only returning the first output argument
                            
                                AttributeError: __enter__ using with statement SqlAlchemy session
                            
                                cv2.connectedComponents not detecting components
                            
                                Scope of caught exception instance in Python 2 and 3
                            
                                Database "is being accessed by other users" error when using ThreadPoolExecutor with Django
                            
                                Outer merging two data frames in place in pandas
                            
                                Workaround for using __name__=='__main__' in Python multiprocessing
                            
                                Save and load two ML models in pyspark
                            
                                Error when creating a custom response message
                            
                                How to use TensorFlow metrics in Keras
                            
                                python cx_oracle cursor.rowcount returning 0 but cursor.fetchall returns data
                            
                                Unsupported hash type error while installing hashlib using pip3
                            
                                python importlib no module named
                            
                                How could I add a column to a DataFrame in Pyspark with incremental values?
                            
                                How to indicate multiple unused values in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"Merging" numpy arrays together with a common dimension [duplicate]

Tags:

python

pandas

dataframe

numpy

Dang Khoa

People also ask

1 Answers

hpaulj

Recent Activity

Donate For Us