How to filter out rows of one python pandas dataframe from another dataframe by comparing columns?

Question

I'm trying to exclude rows from one dataframe, which also occur in another dataframe:

import pandas

df = pandas.DataFrame({'A': ['Chr1', 'Chr1', 'Chr1','Chr1', 'Chr1', 'Chr1','Chr2','Chr2'], 'B': [10,20,30,40,50,60,15,20]})

errors = pandas.DataFrame({'A': ['Chr1', 'Chr1'], 'B': [20,50]})

As a result, the rows in df, that are equal to errors should be left out:

df:
'A'    'B'
Chr1    10
Chr1    30
Chr1    40
Chr1    60
Chr2    15
Chr2    20

It doesn't seem to work with df.merge, and I don't want to iterate over all rows, since the dataframes get pretty large.

Best,

David

Ankush Shah · Accepted Answer

Add an extra column to errors

errors['temp'] = 1

Merge the two dataframes

merged_df = pandas.merge(df,errors,how='outer')

Now keep only those rows which have 'temp' as NaN

merged_df = merged_df[ merged_df['temp'] != 1 ]
del merged_df['temp']

print merged_rdf

      A   B
 0  Chr1  10
 2  Chr1  30
 3  Chr1  40
 5  Chr1  60
 6  Chr2  15
 7  Chr2  20

furas · Answer

For two columns you can do:

 print df[ ~df['A'].isin(errors['A']) | ~df['B'].isin(errors['B']) ]

How to filter out rows of one python pandas dataframe from another dataframe by comparing columns?

Tags:

python

merge

pandas

filter

David Ries

2 Answers

Ankush Shah

furas

Recent Activity

Donate For Us

How to filter out rows of one python pandas dataframe from another dataframe by comparing columns?

Tags:

python

merge

pandas

filter

David Ries

2 Answers

Ankush Shah

furas

Related questions

Recent Activity

Donate For Us