merging pandas dataframes with respect to a function output

Tags:

Is there a convenient way to merge two dataframes with respect to the distance between rows? For the following example, I want to get the color for df1 rows from the closest df2 rows. The distance should be computed as ((x1-x2)**0.5+(y1-y2)**0.5)**0.5.

import pandas as pd

df1 = pd.DataFrame({'x': [50,16,72,61,95,47],'y': [14,22,11,45,58,56],'size':[1,4,3,7,6,5]})
df2 = pd.DataFrame({'x': [10,21,64,31,25,55],'y': [54,76,68,24,34,19],'color':['red','green','blue','white','brown','black']})

267

asked Aug 29 '20 12:08

JBrons

2 Answers

# function to compare one row of df1 with every row of df2
# note the use of abs() here, square root of negative numbers would be complex number, 
# so the result of the computation would be NaN. abs() helps to avoids that
def compare(x, y):
    df2['distance'] = (abs(x-df2['x'])**0.5 + abs(y-df2['y'])**0.5)**0.5
    return df2.loc[df2['distance'].idxmin()]['color']

df1['color'] = df1.apply(lambda row: compare(row['x'], row['y']), axis=1)
print(df1)

    x   y  size  color
0  50  14     1  black
1  16  22     4  white
2  72  11     3  black
3  61  45     7   blue
4  95  58     6   blue
5  47  56     5    red

121

answered Oct 20 '22 14:10

Rajesh

Something from numpy broadcast

df1['color']=df2.color.iloc[np.argmin(np.sum(np.abs(df1[['x','y']].values-df2[['x','y']].values[:,None])**0.5,2),0)].values
df1
Out[79]: 
    x   y  size  color
0  50  14     1  black
1  16  22     4  white
2  72  11     3  black
3  61  45     7   blue
4  95  58     6   blue
5  47  56     5    red

answered Oct 20 '22 16:10

BENY

Related questions
                            
                                how to get all dates of week based on week number in python
                            
                                How to normalize a relative path using pathlib
                            
                                Can I execute a function in "apply" to pandas dataframe asynchronously?
                            
                                How to run a nested loop in python inside list such that the outer loop starts from the next element of the list always and so on
                            
                                How to use Dynamic Time warping with kNN in python
                            
                                Module Not Found Error: No module named 'src'
                            
                                `loss` passed to Optimizer.compute_gradients should be a function when eager execution is enabled
                            
                                Is it possible to check chromedriver.exe version at runtime in python?
                            
                                Python Sphinx css not working on github pages
                            
                                Does await always give other tasks a chance to execute?
                            
                                Difference between Keras' BatchNormalization and PyTorch's BatchNorm2d?
                            
                                Pandas read_csv error due to pandas.io.common not importing is_url in 1.0.x
                            
                                How to run python3 code in VSCode? /bin/sh: 1: python: not found
                            
                                Can't run IDLE with pyenv installation: `Python may not be configured for Tk` `ModuleNotFoundError: No module named _tkinter'
                            
                                Cased VS uncased BERT models in spacy and train data
                            
                                Conda install some-package hangs with (Solving environment: failed)
                            
                                FastAPI/Pydantic accept arbitrary post request body?
                            
                                Sklearn set_config is erroring
                            
                                How can I count a pandas dataframe over duplications
                            
                                Pandas: resample a dataframe to match a DatetimeIndex of a different dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

merging pandas dataframes with respect to a function output

Tags:

python

pandas

dataframe

JBrons

People also ask

2 Answers

Rajesh

BENY

Recent Activity

Donate For Us