How to remove a pandas dataframe from another dataframe

Tags:

How to remove a pandas dataframe from another dataframe, just like the set subtraction:

a=[1,2,3,4,5] b=[1,5] a-b=[2,3,4]

And now we have two pandas dataframe, how to remove df2 from df1:

In [5]: df1=pd.DataFrame([[1,2],[3,4],[5,6]],columns=['a','b']) In [6]: df1 Out[6]:    a  b 0  1  2 1  3  4 2  5  6   In [9]: df2=pd.DataFrame([[1,2],[5,6]],columns=['a','b']) In [10]: df2 Out[10]:    a  b 0  1  2 1  5  6

Then we expect df1-df2 result will be:

In [14]: df Out[14]:    a  b 0  3  4

How to do it?

Thank you.

275

asked May 19 '16 03:05

176coding

1 Answers

Solution

Use pd.concat followed by drop_duplicates(keep=False)

pd.concat([df1, df2, df2]).drop_duplicates(keep=False)

It looks like

   a  b 1  3  4

Explanation

pd.concat adds the two DataFrames together by appending one right after the other. if there is any overlap, it will be captured by the drop_duplicates method. However, drop_duplicates by default leaves the first observation and removes every other observation. In this case, we want every duplicate removed. Hence, the keep=False parameter which does exactly that.

A special note to the repeated df2. With only one df2 any row in df2 not in df1 won't be considered a duplicate and will remain. This solution with only one df2 only works when df2 is a subset of df1. However, if we concat df2 twice, it is guaranteed to be a duplicate and will subsequently be removed.

112

answered Oct 05 '22 22:10

piRSquared

Related questions
                            
                                Jupyter notebook: No connection to server because websocket connection fails
                            
                                Difference between "detach()" and "with torch.nograd()" in PyTorch?
                            
                                Python: Disable images in Selenium Google ChromeDriver
                            
                                Serializing list to JSON
                            
                                Sum one number to every element in a list (or array) in Python
                            
                                numpy array: replace nan values with average of columns
                            
                                Python, Pandas : Return only those rows which have missing values
                            
                                Installing new versions of Python on Cygwin does not install Pip?
                            
                                How to use Python's RotatingFileHandler
                            
                                How to auto register a class when it's defined
                            
                                How to create a self resizing grid of buttons in tkinter?
                            
                                Sorting and Grouping Nested Lists in Python
                            
                                Python: why does `random.randint(a, b)` return a range inclusive of `b`?
                            
                                Choose Python function to call based on a regex
                            
                                A good way to make long strings wrap to newline?
                            
                                Difference between zip(list) and zip(*list) [duplicate]
                            
                                why cv2.imwrite() changes the color of pics?
                            
                                How to split up a long f-string in python?
                            
                                PIL TypeError: Cannot handle this data type
                            
                                Does Python have a bitfield type?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to remove a pandas dataframe from another dataframe

Tags:

python

pandas

dataframe

subtraction