pandas get rows which are NOT in other dataframe

Tags:

I've two pandas data frames that have some rows in common.

Suppose dataframe2 is a subset of dataframe1.

How can I get the rows of dataframe1 which are not in dataframe2?

df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12, 13, 14]})  df2 = pandas.DataFrame(data = {'col1' : [1, 2, 3], 'col2' : [10, 11, 12]})

df1

   col1  col2 0     1    10 1     2    11 2     3    12 3     4    13 4     5    14

df2

   col1  col2 0     1    10 1     2    11 2     3    12

Expected result:

   col1  col2 3     4    13 4     5    14

485

asked Mar 06 '15 15:03

think nice things

1 Answers

The currently selected solution produces incorrect results. To correctly solve this problem, we can perform a left-join from df1 to df2, making sure to first get just the unique rows for df2.

First, we need to modify the original DataFrame to add the row with data [3, 10].

df1 = pd.DataFrame(data = {'col1' : [1, 2, 3, 4, 5, 3],                             'col2' : [10, 11, 12, 13, 14, 10]})  df2 = pd.DataFrame(data = {'col1' : [1, 2, 3],                            'col2' : [10, 11, 12]})  df1     col1  col2 0     1    10 1     2    11 2     3    12 3     4    13 4     5    14 5     3    10  df2     col1  col2 0     1    10 1     2    11 2     3    12

Perform a left-join, eliminating duplicates in df2 so that each row of df1 joins with exactly 1 row of df2. Use the parameter indicator to return an extra column indicating which table the row was from.

df_all = df1.merge(df2.drop_duplicates(), on=['col1','col2'],                     how='left', indicator=True) df_all     col1  col2     _merge 0     1    10       both 1     2    11       both 2     3    12       both 3     4    13  left_only 4     5    14  left_only 5     3    10  left_only

Create a boolean condition:

df_all['_merge'] == 'left_only'  0    False 1    False 2    False 3     True 4     True 5     True Name: _merge, dtype: bool

Why other solutions are wrong

A few solutions make the same mistake - they only check that each value is independently in each column, not together in the same row. Adding the last row, which is unique but has the values from both columns from df2 exposes the mistake:

common = df1.merge(df2,on=['col1','col2']) (~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2)) 0    False 1    False 2    False 3     True 4     True 5    False dtype: bool

This solution gets the same wrong result:

df1.isin(df2.to_dict('l')).all(1)

176

answered Oct 20 '22 10:10

Ted Petrou

Related questions
                            
                                Best ways to teach a beginner to program? [closed]
                            
                                How to send email attachments?
                            
                                Decode HTML entities in Python string?
                            
                                How do you find the first key in a dictionary?
                            
                                When I catch an exception, how do I get the type, file, and line number?
                            
                                A weighted version of random.choice
                            
                                Executing command line programs from within python [duplicate]
                            
                                How do I set the maximum line length in PyCharm?
                            
                                json.dumps vs flask.jsonify
                            
                                Reverse colormap in matplotlib
                            
                                (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape [duplicate]
                            
                                Running a single test from unittest.TestCase via the command line
                            
                                How do I compare version numbers in Python?
                            
                                How to call a function within class?
                            
                                Why isn't Python very good for functional programming? [closed]
                            
                                Python function global variables?
                            
                                Pandas group-by and sum
                            
                                In Python, when to use a Dictionary, List or Set?
                            
                                How to retry after exception?
                            
                                What does Python's eval() do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas get rows which are NOT in other dataframe

Tags:

python

pandas

dataframe

think nice things

People also ask

1 Answers

Why other solutions are wrong

Ted Petrou

Recent Activity

Donate For Us