Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find dropped data after using Pandas merge in python?

Tags:

python

pandas

My Dataframe looks like following.I am using Pandas merge function to merge two dataframes, and I am trying to find row that was dropped. Is there a way in Pandas or python to track this ?

df1=pd.DataFrame(({'Name':('A','B','C'),'Age':(34,23,90)}))
df2=pd.DataFrame(({'Name':('A','B','D'),'Add':('rt','ct','pt')}))
pd.merge(df1,df2,on='Name')
like image 648
Data_is_Power Avatar asked Mar 05 '23 15:03

Data_is_Power


1 Answers

Use merge with outer join and parameter indicator=True:

df = pd.merge(df1,df2,on='Name', indicator=True, how='outer')
print (df)
  Name   Age  Add      _merge
0    A  34.0   rt        both
1    B  23.0   ct        both
2    C  90.0  NaN   left_only
3    D   NaN   pt  right_only

Last filter no both rows by boolean indexing:

print (df[df['_merge'] != 'both'])
  Name   Age  Add      _merge
2    C  90.0  NaN   left_only
3    D   NaN   pt  right_only

Another solution is filtering with isin and inverting mask by ~:

print (df1[~df1['Name'].isin(df2['Name'])])
  Name  Age
2    C   90

print (df2[~df2['Name'].isin(df1['Name'])])
  Name Add
2    D  pt
like image 198
jezrael Avatar answered Mar 15 '23 04:03

jezrael