So I've got 2 data-frames I'd like to merge together.
I'm merging on 3 columns, 2 is an easy join.
joined_df = pd.merge(df1, df2, how='left', on=['name', 'city'])
I want this to be using a third column, but it's going to be a comparison, something like this:
joined_df = pd.merge(df1, df2, how='left',
on=['name', 'city', 'df1.year' >= 'df2.year_min'])
Not sure what the right syntax is here.
If it was SQL, it would be easy for me.
SELECT * FROM df1
JOIN df2 on (df1.name = df2.name and df1.year = df2.year and df1.year > df2.year_min)
Any assistance?
Pandasql is a python library that allows manipulation of a Pandas Dataframe using SQL. Under the hood, Pandasql creates an SQLite table from the Pandas Dataframe of interest and allow users to query from the SQLite table using SQL.
The merge function of Pandas combines dataframes based on values in common columns. The same operation is done by joins in SQL.
Inner Join in PandasIt returns a dataframe with only those rows that have common characteristics. An inner join requires each row in the two joined dataframes to have matching column values. This is similar to the intersection of two sets.
Pandas merge
only supports equi-joins. You'll need to add a second step that filters the result, something like this:
joined_df = df1.merge(df2, how='left', on=['name', 'city'])
joined_df = joined_df[joined_df.year > joined_df.year_min]
You can using merge_asof
, default is backward merge
pd.merge_asof(df1,df2, left_on='year',right_on='joined_df', by=['name', 'city'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With