Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas, how to delete rows from a Data Frame based on another Data Frame?

I have 2 Data Frames, one named USERS and another named EXCLUDE. Both of them have a field named "email".

Basically, I want to remove every row in USERS that has an email contained in EXCLUDE.

How can I do it?

like image 653
Vini Avatar asked Oct 05 '16 17:10

Vini


People also ask

How do you delete a row from a DataFrame that exists in another DataFrame?

To remove rows from a data frame that exists in another data frame, we can use subsetting with single square brackets. This removal will help us to find the unique rows in the data frame based on the column of another data frame.

How do I delete rows in Pandas DataFrame based on condition?

Use pandas. DataFrame. drop() method to delete/remove rows with condition(s).

How do you subtract a DataFrame from a different DataFrame in Python?

subtract() function is used for finding the subtraction of dataframe and other, element-wise. This function is essentially same as doing dataframe – other but with a support to substitute for missing data in one of the inputs.


1 Answers

You can use boolean indexing and condition with isin, inverting boolean Series is by ~:

import pandas as pd  USERS = pd.DataFrame({'email':['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]']}) print (USERS)      email 0  [email protected] 1  [email protected] 2  [email protected] 3  [email protected] 4  [email protected]  EXCLUDE = pd.DataFrame({'email':['[email protected]','[email protected]']}) print (EXCLUDE)      email 0  [email protected] 1  [email protected] 
print (USERS.email.isin(EXCLUDE.email)) 0     True 1    False 2    False 3    False 4     True Name: email, dtype: bool  print (~USERS.email.isin(EXCLUDE.email)) 0    False 1     True 2     True 3     True 4    False Name: email, dtype: bool  print (USERS[~USERS.email.isin(EXCLUDE.email)])      email 1  [email protected] 2  [email protected] 3  [email protected] 

Another solution with merge:

df = pd.merge(USERS, EXCLUDE, how='outer', indicator=True) print (df)      email     _merge 0  [email protected]       both 1  [email protected]  left_only 2  [email protected]  left_only 3  [email protected]  left_only 4  [email protected]       both  print (df.loc[df._merge == 'left_only', ['email']])      email 1  [email protected] 2  [email protected] 3  [email protected] 
like image 58
jezrael Avatar answered Oct 09 '22 02:10

jezrael