Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can one perform a left join in pandas that selects only the first match on the right?

Tags:

Can one perform a left join in pandas that selects only the first match on the right? Example:

left            = pd.DataFrame() left['age']     = [11, 12] right           = pd.DataFrame() right['age']    = [10, 11, 11] right['salary'] = [ 100, 150, 200 ] left.merge( right, how='left', on='age' ) 

Returns

   age  salary 0   11     150 1   11     200 2   12     NaN 

But what I would like is to preserve the number of rows of left, by merely taking the first match. That is:

   age  salary 0   11     150 2   12     NaN 

So I've been using

left.merge( right.drop_duplicates(['age']), how='left', on='age') 

but I believe this makes a full copy of right. And it smells funny.

Is there a more elegant way?

like image 717
Quant Avatar asked Oct 08 '14 14:10

Quant


People also ask

How does LEFT join work in pandas?

Left Join in Pandas Left join, also known as Left Outer Join, returns a dataframe containing all the rows of the left dataframe. All the non-matching rows of the left dataframe contain NaN for the columns in the right dataframe.

What is left on and right on in pandas merge?

left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame. right_on − Columns from the right DataFrame to use as keys. Can either be column names or arrays with length equal to the length of the DataFrame.

What kind of joins does pandas offer?

There are mainly five types of Joins in Pandas: Inner Join. Left Outer Join. Right Outer Join.


1 Answers

Yes, you can use groupby to remove your duplicate lines. Do everything you've done to define left and right. Now, I define a new dataframe on your last line:

left2=left.merge( right, how='left', on='age' ) df= left2.groupby(['age'])['salary'].first().reset_index() df 

At first I used a .min(), which will give you the minimum salary at each age, as such:

df= left2.groupby(['age'])['salary'].min().reset_index() 

But you were specifically asking about the first match. To do so you use the .first() option. Note: The .reset_index() at the end, just reformats the output of the groupby to be a dataframe again.

like image 135
samus Avatar answered Oct 18 '22 00:10

samus