Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas equivalent of SQL non-equi JOIN

So I've got 2 data-frames I'd like to merge together.

I'm merging on 3 columns, 2 is an easy join.

joined_df = pd.merge(df1, df2, how='left', on=['name', 'city'])

I want this to be using a third column, but it's going to be a comparison, something like this:

joined_df = pd.merge(df1, df2, how='left',
on=['name', 'city', 'df1.year' >= 'df2.year_min'])

Not sure what the right syntax is here.

If it was SQL, it would be easy for me.

SELECT * FROM df1
JOIN df2 on (df1.name = df2.name and df1.year = df2.year and df1.year > df2.year_min)

Any assistance?

like image 771
firestreak Avatar asked May 28 '18 21:05

firestreak


People also ask

Can Pandas be used for SQL?

Pandasql is a python library that allows manipulation of a Pandas Dataframe using SQL. Under the hood, Pandasql creates an SQLite table from the Pandas Dataframe of interest and allow users to query from the SQLite table using SQL.

Which of the following SQL statements is equivalent to the merge operation in Pandas?

The merge function of Pandas combines dataframes based on values in common columns. The same operation is done by joins in SQL.

What is join in Pandas?

Inner Join in PandasIt returns a dataframe with only those rows that have common characteristics. An inner join requires each row in the two joined dataframes to have matching column values. This is similar to the intersection of two sets.


2 Answers

Pandas merge only supports equi-joins. You'll need to add a second step that filters the result, something like this:

joined_df = df1.merge(df2, how='left', on=['name', 'city'])
joined_df = joined_df[joined_df.year > joined_df.year_min]
like image 180
cs95 Avatar answered Sep 26 '22 11:09

cs95


You can using merge_asof, default is backward merge

pd.merge_asof(df1,df2, left_on='year',right_on='joined_df', by=['name', 'city'])
like image 42
BENY Avatar answered Sep 26 '22 11:09

BENY