I'm trying to exclude rows from one dataframe, which also occur in another dataframe:
import pandas
df = pandas.DataFrame({'A': ['Chr1', 'Chr1', 'Chr1','Chr1', 'Chr1', 'Chr1','Chr2','Chr2'], 'B': [10,20,30,40,50,60,15,20]})
errors = pandas.DataFrame({'A': ['Chr1', 'Chr1'], 'B': [20,50]})
As a result, the rows in df, that are equal to errors should be left out:
df:
'A'    'B'
Chr1    10
Chr1    30
Chr1    40
Chr1    60
Chr2    15
Chr2    20
It doesn't seem to work with df.merge, and I don't want to iterate over all rows, since the dataframes get pretty large.
Best,
David
Add an extra column to errors
errors['temp'] = 1
Merge the two dataframes
merged_df = pandas.merge(df,errors,how='outer')
Now keep only those rows which have 'temp' as NaN
merged_df = merged_df[ merged_df['temp'] != 1 ]
del merged_df['temp']
print merged_rdf
      A   B
 0  Chr1  10
 2  Chr1  30
 3  Chr1  40
 5  Chr1  60
 6  Chr2  15
 7  Chr2  20
                        For two columns you can do:
 print df[ ~df['A'].isin(errors['A']) | ~df['B'].isin(errors['B']) ]
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With