Is there a convenient way to merge two dataframes with respect to the distance between rows? For the following example, I want to get the color for df1 rows from the closest df2 rows. The distance should be computed as ((x1-x2)**0.5+(y1-y2)**0.5)**0.5
.
import pandas as pd
df1 = pd.DataFrame({'x': [50,16,72,61,95,47],'y': [14,22,11,45,58,56],'size':[1,4,3,7,6,5]})
df2 = pd.DataFrame({'x': [10,21,64,31,25,55],'y': [54,76,68,24,34,19],'color':['red','green','blue','white','brown','black']})
Inner joins The most common type of join is called an inner join. An inner join combines two DataFrames based on a join key and returns a new DataFrame that contains only those rows that have matching values in both of the original DataFrames.
The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.
Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes. If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.
To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.
# function to compare one row of df1 with every row of df2
# note the use of abs() here, square root of negative numbers would be complex number,
# so the result of the computation would be NaN. abs() helps to avoids that
def compare(x, y):
df2['distance'] = (abs(x-df2['x'])**0.5 + abs(y-df2['y'])**0.5)**0.5
return df2.loc[df2['distance'].idxmin()]['color']
df1['color'] = df1.apply(lambda row: compare(row['x'], row['y']), axis=1)
print(df1)
x y size color
0 50 14 1 black
1 16 22 4 white
2 72 11 3 black
3 61 45 7 blue
4 95 58 6 blue
5 47 56 5 red
Something from numpy
broadcast
df1['color']=df2.color.iloc[np.argmin(np.sum(np.abs(df1[['x','y']].values-df2[['x','y']].values[:,None])**0.5,2),0)].values
df1
Out[79]:
x y size color
0 50 14 1 black
1 16 22 4 white
2 72 11 3 black
3 61 45 7 blue
4 95 58 6 blue
5 47 56 5 red
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With