Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two pandas dataframe based on conditional

The objective is to combine two df row wise, if a predetermine condition is met. Specifically, if the difference between the column is less or equal than a threshold, then, join the row of the df.

Given two df: df1 and df2, the following code partially achieve the objective.

import pandas as pd

df1 = pd.DataFrame ( {'time': [2, 3, 4, 24, 31]} )
df2 = pd.DataFrame (  {'time': [4.1, 24.7, 31.4, 5]} )
th = 0.9
all_comb=[]
for index, row in df1.iterrows ():
    for index2, row2 in df2.iterrows ():
        diff = abs ( row ['time'] - row2 ['time'] )
        if diff <= th:
            all_comb.append({'idx_1':index,'time_1':row ['time'], 'idx_2':index2,'time_2':row2 ['time']})
df_all = pd.DataFrame(all_comb)

outputted

       idx_1  time_1  idx_2  time_2
0      2       4      0     4.1
1      3      24      1    24.7
2      4      31      2    31.4

However, the above approach ignore certain information i.e., the value of 2 and 3 from the df1, and the value of 5 from df2.

The expected output should be something like

idx_1  time_1  idx_2  time_2

0      2       NA    NA
1      3       NA    NA    
2       4      0     4.1
3      24      1    24.7
4      31      2    31.4
NA     NA      3     5

Appreciate for any hint or any way that more compact and efficient than the proposed above.

like image 825
rpb Avatar asked Oct 27 '25 05:10

rpb


1 Answers

You can perform a cross merge and then subset all the rows at once based on your condition. Then we concat, adding back any rows that had no conditions met from both DataFrames.

import pandas as pd

df1 = df1.reset_index().add_suffix('_1')
df2 = df2.reset_index().add_suffix('_2')

m = df1.merge(df2, how='cross')

# Subset to all matches: |time_diff| <= thresh
th = 0.9
m = m[(m['time_1'] - m['time_2']).abs().le(th)]

# Add back rows with no matches
res = pd.concat([df1[~df1.index_1.isin(m.index_1)],
                 m,
                 df2[~df2.index_2.isin(m.index_2)]], ignore_index=True)

print(res)
   index_1  time_1  index_2  time_2
0      0.0     2.0      NaN     NaN
1      1.0     3.0      NaN     NaN
2      2.0     4.0      0.0     4.1
3      3.0    24.0      1.0    24.7
4      4.0    31.0      2.0    31.4
5      NaN     NaN      3.0     5.0
like image 173
ALollz Avatar answered Oct 29 '25 19:10

ALollz



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!