I've two pandas data frames that have some rows in common.
Suppose dataframe2 is a subset of dataframe1.
How can I get the rows of dataframe1 which are not in dataframe2?
df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5], 'col2' : [10, 11, 12, 13, 14]})  df2 = pandas.DataFrame(data = {'col1' : [1, 2, 3], 'col2' : [10, 11, 12]})  df1
   col1  col2 0     1    10 1     2    11 2     3    12 3     4    13 4     5    14  df2
   col1  col2 0     1    10 1     2    11 2     3    12  Expected result:
   col1  col2 3     4    13 4     5    14 
                To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.
The currently selected solution produces incorrect results. To correctly solve this problem, we can perform a left-join from df1 to df2, making sure to first get just the unique rows for df2. 
First, we need to modify the original DataFrame to add the row with data [3, 10].
df1 = pd.DataFrame(data = {'col1' : [1, 2, 3, 4, 5, 3],                             'col2' : [10, 11, 12, 13, 14, 10]})  df2 = pd.DataFrame(data = {'col1' : [1, 2, 3],                            'col2' : [10, 11, 12]})  df1     col1  col2 0     1    10 1     2    11 2     3    12 3     4    13 4     5    14 5     3    10  df2     col1  col2 0     1    10 1     2    11 2     3    12   Perform a left-join, eliminating duplicates in df2 so that each row of df1 joins with exactly 1 row of df2. Use the parameter indicator to return an extra column indicating which table the row was from.
df_all = df1.merge(df2.drop_duplicates(), on=['col1','col2'],                     how='left', indicator=True) df_all     col1  col2     _merge 0     1    10       both 1     2    11       both 2     3    12       both 3     4    13  left_only 4     5    14  left_only 5     3    10  left_only   Create a boolean condition:
df_all['_merge'] == 'left_only'  0    False 1    False 2    False 3     True 4     True 5     True Name: _merge, dtype: bool   A few solutions make the same mistake - they only check that each value is independently in each column, not together in the same row. Adding the last row, which is unique but has the values from both columns from df2 exposes the mistake:
common = df1.merge(df2,on=['col1','col2']) (~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2)) 0    False 1    False 2    False 3     True 4     True 5    False dtype: bool   This solution gets the same wrong result:
df1.isin(df2.to_dict('l')).all(1) 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With