Pandas: Find rows which don't exist in another DataFrame by multiple columns

Tags:

same as this python pandas: how to find rows in one dataframe but not in another? but with multiple columns

This is the setup:

import pandas as pd  df = pd.DataFrame(dict(     col1=[0,1,1,2],     col2=['a','b','c','b'],     extra_col=['this','is','just','something'] ))  other = pd.DataFrame(dict(     col1=[1,2],     col2=['b','c'] ))

Now, I want to select the rows from df which don't exist in other. I want to do the selection by col1 and col2

In SQL I would do:

Click to copy

select * from df  where not exists (     select * from other o      where df.col1 = o.col1 and      df.col2 = o.col2 )

And in Pandas I can do something like this but it feels very ugly. Part of the ugliness could be avoided if df had id-column but it's not always available.

Click to copy

key_col = ['col1','col2'] df_with_idx = df.reset_index() common = pd.merge(df_with_idx,other,on=key_col)['index'] mask = df_with_idx['index'].isin(common)  desired_result =  df_with_idx[~mask].drop('index',axis=1)

So maybe there is some more elegant way?

437

asked Sep 18 '15 13:09

Pekka

2 Answers

Since 0.17.0 there is a new indicator param you can pass to merge which will tell you whether the rows are only present in left, right or both:

Click to copy

In [5]: merged = df.merge(other, how='left', indicator=True) merged  Out[5]:    col1 col2  extra_col     _merge 0     0    a       this  left_only 1     1    b         is       both 2     1    c       just  left_only 3     2    b  something  left_only  In [6]:     merged[merged['_merge']=='left_only']  Out[6]:    col1 col2  extra_col     _merge 0     0    a       this  left_only 2     1    c       just  left_only 3     2    b  something  left_only

So you can now filter the merged df by selecting only 'left_only' rows

answered Oct 13 '22 00:10

EdChum

Interesting

Click to copy

cols = ['col1','col2'] #get copies where the indeces are the columns of interest df2 = df.set_index(cols) other2 = other.set_index(cols) #Look for index overlap, ~ df[~df2.index.isin(other2.index)]

Returns:

Click to copy

    col1 col2  extra_col 0     0    a       this 2     1    c       just 3     2    b  something

Seems a little bit more elegant...

answered Oct 12 '22 22:10

greg_data

Related questions
                            
                                Slicing Sparse Matrices in Scipy -- Which Types Work Best?
                            
                                How to return array from C++ function to Python using ctypes
                            
                                While reading file on Python, I got a UnicodeDecodeError. What can I do to resolve this?
                            
                                Why doesn't Python hash function give the same values when run on Android implementation?
                            
                                How to check if variable is a specific class in python?
                            
                                How to alphabetically order a drop-down list in Django admin?
                            
                                py.test does not find tests under a class
                            
                                sqlalchemy primary key without auto-increment
                            
                                Pandas: control new column names when merging two dataframes?
                            
                                Is there any way to print **kwargs in Python
                            
                                How to set x axis values in matplotlib python?
                            
                                How to fix the error "QObject::moveToThread:" in opencv in python?
                            
                                How to locally develop a python package?
                            
                                How do I extend a python module? Adding new functionality to the `python-twitter` package
                            
                                Finding the index of a string in a tuple
                            
                                Python - how do I call external python programs?
                            
                                Drawing a graph or a network from a distance matrix?
                            
                                Python+OpenCV: cv2.imwrite
                            
                                Customizing unittest.mock.mock_open for iteration
                            
                                Python mock patch argument `new` vs `new_callable`

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: Find rows which don't exist in another DataFrame by multiple columns

Tags:

python

join

pandas

Pekka

People also ask

2 Answers

EdChum

greg_data

Recent Activity

Donate For Us