Can one perform a left join in pandas that selects only the first match on the right? Example:
left = pd.DataFrame() left['age'] = [11, 12] right = pd.DataFrame() right['age'] = [10, 11, 11] right['salary'] = [ 100, 150, 200 ] left.merge( right, how='left', on='age' )
Returns
age salary 0 11 150 1 11 200 2 12 NaN
But what I would like is to preserve the number of rows of left, by merely taking the first match. That is:
age salary 0 11 150 2 12 NaN
So I've been using
left.merge( right.drop_duplicates(['age']), how='left', on='age')
but I believe this makes a full copy of right. And it smells funny.
Is there a more elegant way?
Left Join in Pandas Left join, also known as Left Outer Join, returns a dataframe containing all the rows of the left dataframe. All the non-matching rows of the left dataframe contain NaN for the columns in the right dataframe.
left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame. right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.
There are mainly five types of Joins in Pandas: Inner Join. Left Outer Join. Right Outer Join.
Yes, you can use groupby to remove your duplicate lines. Do everything you've done to define left and right. Now, I define a new dataframe on your last line:
left2=left.merge( right, how='left', on='age' ) df= left2.groupby(['age'])['salary'].first().reset_index() df
At first I used a .min(), which will give you the minimum salary at each age, as such:
df= left2.groupby(['age'])['salary'].min().reset_index()
But you were specifically asking about the first match. To do so you use the .first() option. Note: The .reset_index() at the end, just reformats the output of the groupby to be a dataframe again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With