Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find extra values after comparing two columns of dataframe python

Tags:

python

pandas

i have a two dataframes which consists of column

df has column: id1

id1 
 1  
 2  
 3
 4
 5 
 6

df2 has column: id2

id2
 2 
 1
 5
 4

as you can see in df1 there are values which are not in df2['id2'] 3,6

is there any way to find it by doing difference of two dataframe columns or any other way?

i tried it using

df2.isin(df1)

but only getting bool values.

but i want the actual rows

like image 823
Shubham R Avatar asked Dec 07 '22 19:12

Shubham R


2 Answers

There are a number of ways you can solve this but Pandas index objects have a difference method that finds all the indexes that are missing from the second index from the calling index.

idx1 = pd.Index(df.id1)
idx2 = pd.Index(df.id2)

idx1.difference(idx2).values

array([3, 6])

With isin you will get the same result with this:

df[~df.id1.isin(df2.id2)]
like image 58
Ted Petrou Avatar answered May 16 '23 04:05

Ted Petrou


You could also use set operations

list(set(df.id1) - set(df2.id2))

[3, 6]
like image 22
piRSquared Avatar answered May 16 '23 04:05

piRSquared