Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NumPy equivalent of merge

I'm transitioning some stuff from R to Python and am curious about merging efficiently. I've found some stuff on concatenate in NumPy (using NumPy for operations, so I'd like to stick with it), but it doesn't work as expected.

Take two datasets

d1 = np.array([['1a2', '0'], ['2dd', '0'], ['z83', '1'], ['fz3', '0']])
ID      Label
1a2     0
2dd     0
z83     1
fz3     0

and

d2 = np.array([['1a2', '33.3', '22.2'], 
               ['43m', '66.6', '66.6'], 
               ['z83', '12.2', '22.1']])
ID     val1   val2
1a2    33.3   22.2
43m    66.6   66.6
z83    12.2   22.1

I want to merge these together so that the result is

d3

ID    Label    val1    val2
1a2   0        33.3    22.2
z83   1        12.2    22.1

So it's identified rows that match on the ID column and then concatenated these together. This is relatively simple in R using merge, but in NumPy it's less obvious to me.

Is there a way to do this natively in NumPy that I am missing?

like image 743
Jibril Avatar asked Mar 26 '18 15:03

Jibril


People also ask

How do I merge data in NumPy?

Use numpy. concatenate() to merge the content of two or multiple arrays into a single array. This function takes several arguments along with the NumPy arrays to concatenate and returns a Numpy array ndarray. Note that this method also takes axis as another argument, when not specified it defaults to 0.

How do I merge two NumPy arrays in Python?

Joining NumPy Arrays We pass a sequence of arrays that we want to join to the concatenate() function, along with the axis. If axis is not explicitly passed, it is taken as 0.

How do I merge two columns in NumPy?

NumPy's concatenate function can be used to concatenate two arrays either row-wise or column-wise. Concatenate function can take two or more arrays of the same shape and by default it concatenates row-wise i.e. axis=0. The resulting array after row-wise concatenation is of the shape 6 x 3, i.e. 6 rows and 3 columns.

What is the difference between append and concatenate in NumPy?

The Numpy append function allows us to add new values to the end of an existing NumPy array. This function returns a copy of the existing array with the values appended to the specified axis. In Concatenation It can be used to concatenate two arrays either row-wise or column-wise.


1 Answers

Here's one NumPy based solution using masking -

def numpy_merge_bycol0(d1, d2):
    # Mask of matches in d1 against d2
    d1mask = np.isin(d1[:,0], d2[:,0])

    # Mask of matches in d2 against d1
    d2mask = np.isin(d2[:,0], d1[:,0])

    # Mask respective arrays and concatenate for final o/p
    return np.c_[d1[d1mask], d2[d2mask,1:]]

Sample run -

In [43]: d1
Out[43]: 
array([['1a2', '0'],
       ['2dd', '0'],
       ['z83', '1'],
       ['fz3', '0']], dtype='|S3')

In [44]: d2
Out[44]: 
array([['1a2', '33.3', '22.2'],
       ['43m', '66.6', '66.6'],
       ['z83', '12.2', '22.1']], dtype='|S4')

In [45]: numpy_merge_bycol0(d1, d2)
Out[45]: 
array([['1a2', '0', '33.3', '22.2'],
       ['z83', '1', '12.2', '22.1']], dtype='|S4')

We could also use broadcasting to get the indices and then integer-indexing in place of masking, like so -

idx = np.argwhere(d1[:,0,None] == d2[:,0])
out = np.c_[d1[idx[:,0]], d2[idx[:,0,1:]
like image 59
Divakar Avatar answered Oct 05 '22 14:10

Divakar