Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove a pandas dataframe from another dataframe

How to remove a pandas dataframe from another dataframe, just like the set subtraction:

a=[1,2,3,4,5] b=[1,5] a-b=[2,3,4] 

And now we have two pandas dataframe, how to remove df2 from df1:

In [5]: df1=pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']) In [6]: df1 Out[6]:    a  b 0  1  2 1  3  4 2  5  6   In [9]: df2=pd.DataFrame([[1,2],[5,6]],columns=['a','b']) In [10]: df2 Out[10]:    a  b 0  1  2 1  5  6 

Then we expect df1-df2 result will be:

In [14]: df Out[14]:    a  b 0  3  4 

How to do it?

Thank you.

like image 275
176coding Avatar asked May 19 '16 03:05

176coding


People also ask

Can I subtract one DataFrame from another?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.

How do I drop a DataFrame from a DataFrame in Python?

Pandas DataFrame drop() Method The drop() method removes the specified row or column. By specifying the column axis ( axis='columns' ), the drop() method removes the specified column. By specifying the row axis ( axis='index' ), the drop() method removes the specified row.

How do you delete common rows in two DataFrames in pandas?

You can use pandas. concat to concatenate the two dataframes rowwise, followed by drop_duplicates to remove all the duplicated rows in them.


1 Answers

Solution

Use pd.concat followed by drop_duplicates(keep=False)

pd.concat([df1, df2, df2]).drop_duplicates(keep=False) 

It looks like

   a  b 1  3  4 

Explanation

pd.concat adds the two DataFrames together by appending one right after the other. if there is any overlap, it will be captured by the drop_duplicates method. However, drop_duplicates by default leaves the first observation and removes every other observation. In this case, we want every duplicate removed. Hence, the keep=False parameter which does exactly that.

A special note to the repeated df2. With only one df2 any row in df2 not in df1 won't be considered a duplicate and will remain. This solution with only one df2 only works when df2 is a subset of df1. However, if we concat df2 twice, it is guaranteed to be a duplicate and will subsequently be removed.

like image 112
piRSquared Avatar answered Oct 05 '22 22:10

piRSquared