So I've got 2 data-frames I'd like to merge together. I'm merging on 3 columns, 2 is an easy join. <pre class="prettyprint"><code>joined_df = pd.merge(df1, df2, how='left', on=['name', 'city']) </code></pre> I want this to be using a third column, but it's going to be a comparison, something like this: <pre class="prettyprint"><code>joined_df = pd.merge(df1, df2, how='left', on=['name', 'city', 'df1.year' >= 'df2.year_min']) </code></pre> Not sure what the right syntax is here. If it was SQL, it would be easy for me. <pre class="prettyprint"><code>SELECT * FROM df1 JOIN df2 on (df1.name = df2.name and df1.year = df2.year and df1.year > df2.year_min) </code></pre> Any assistance?

Pandas <code>merge</code> only supports equi-joins. You'll need to add a second step that filters the result, something like this: <pre class="prettyprint"><code>joined_df = df1.merge(df2, how='left', on=['name', 'city']) joined_df = joined_df[joined_df.year > joined_df.year_min] </code></pre>

You can using <code>merge_asof</code>, default is backward merge <pre class="prettyprint"><code>pd.merge_asof(df1,df2, left_on='year',right_on='joined_df', by=['name', 'city']) </code></pre>

Pandas equivalent of SQL non-equi JOIN

Tags:

python

merge

join

pandas

dataframe

So I've got 2 data-frames I'd like to merge together.

I'm merging on 3 columns, 2 is an easy join.

joined_df = pd.merge(df1, df2, how='left', on=['name', 'city'])

I want this to be using a third column, but it's going to be a comparison, something like this:

joined_df = pd.merge(df1, df2, how='left',
on=['name', 'city', 'df1.year' >= 'df2.year_min'])

Not sure what the right syntax is here.

If it was SQL, it would be easy for me.

SELECT * FROM df1
JOIN df2 on (df1.name = df2.name and df1.year = df2.year and df1.year > df2.year_min)

Any assistance?

771

asked May 28 '18 21:05

firestreak

2 Answers

Pandas merge only supports equi-joins. You'll need to add a second step that filters the result, something like this:

joined_df = df1.merge(df2, how='left', on=['name', 'city'])
joined_df = joined_df[joined_df.year > joined_df.year_min]

180

answered Sep 26 '22 11:09

cs95

You can using merge_asof, default is backward merge

pd.merge_asof(df1,df2, left_on='year',right_on='joined_df', by=['name', 'city'])

answered Sep 26 '22 11:09

BENY

Related questions
                            
                                How to get the result of an SQL query from Big Query in Airflow?
                            
                                C++ 17 compatability with Python 2.7
                            
                                Difference between zip() functions in Python 2 and Python 3 [duplicate]
                            
                                AttributeError: module 'networkx' has no attribute 'utils'
                            
                                Could not find a version that satisfies the requirement numpy == 1.9.3
                            
                                Create a Python executable with chromedriver & Selenium
                            
                                Correct way of setting Python class attributes
                            
                                Using Scrapy in Jupyter notebook / accessing response directly
                            
                                Joining a large and a massive spark dataframe
                            
                                How to get results from custom loss function in Keras?
                            
                                How does Keras ImageDataGenerator rescale parameter works?
                            
                                python non blocking write csv file
                            
                                Convert virtualenv instance/`requirements.txt` to pipenv
                            
                                Python How to keep MessageboxW on top of all other windows?
                            
                                Calculation of xlogx with numpy
                            
                                Maximum Product of Three Numbers
                            
                                How to implement a comment feature that works with multiple selections in QScintilla?
                            
                                Getting features in RFECV scikit-learn
                            
                                Predict label of text with multi-layered perceptron model in Tensorflow
                            
                                How to create a conda environment shortcut on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With