Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do left outer join exclusion in pandas

Tags:

python

pandas

I have two dataframes, A and B, and I want to get those in A but not in B, just like the one right below the top left corner.

The one below the top left

Dataframe A has columns ['a','b' + others] and B has columns ['a','b' + others]. There are no NaN values. I tried the following:

1.

dfm = dfA.merge(dfB, on=['a','b']) dfe = dfA[(~dfA['a'].isin(dfm['a']) | (~dfA['b'].isin(dfm['b']) 

2.

dfm = dfA.merge(dfB, on=['a','b']) dfe = dfA[(~dfA['a'].isin(dfm['a']) & (~dfA['b'].isin(dfm['b']) 

3.

dfe = dfA[(~dfA['a'].isin(dfB['a']) | (~dfA['b'].isin(dfB['b']) 

4.

dfe = dfA[(~dfA['a'].isin(dfB['a']) & (~dfA['b'].isin(dfB['b']) 

but when I get len(dfm) and len(dfe), they don't sum up to dfA (it's off by a few numbers). I've tried doing this on dummy cases and #1 works, so maybe my dataset may have some peculiarities I am unable to reproduce.

What's the right way to do this?

like image 738
irene Avatar asked May 26 '18 13:05

irene


People also ask

How do you do a left join in pandas?

Pandas Left Join using join() panads. DataFrame. join() method by default does the leftt Join on row indices and provides a way to do join on other join types. It also supports different params, refer to pandas join() for syntax, usage, and more examples.

How do you exclude rows in a DataFrame?

drop() method you can drop/remove/delete rows from DataFrame. axis param is used to specify what axis you would like to remove. By default axis = 0 meaning to remove rows. Use axis=1 or columns param to remove columns.

How do you exclude columns from a DataFrame?

We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.


1 Answers

Check out this link

df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True) df = df[df['_merge'] == 'left_only'] 

One liner :

df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True               ).query('_merge=="left_only"') 
like image 52
phi Avatar answered Sep 21 '22 19:09

phi