Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas select rows where two columns are (not) equal

Tags:

python

pandas

hsp.loc[hsp['Len_old'] == hsp['Len_new']]

I try this code, it's working.

But I tried these three

hsp.loc[hsp['Type_old'] == hsp['Type_new']] 
hsp.loc[hsp['Type_old'] != hsp['Type_new']] 
hsp.loc[hsp['Len_old'] != hsp['Len_new']] 

They are not working.

My data table hsp is like

id  Type_old  Type_new  Len_old  Len_new
1    Num       Num       15       15
2    Num       Char      12       12
3    Char      Num       10       8
4    Num       Num       4        5
5    Char      Char      9        10

Is there a better approach to select rows where two columns are not queal.

like image 774
kkjoe Avatar asked Jul 11 '17 16:07

kkjoe


People also ask

How do you find rows from one DataFrame is not in another?

Method 1: Using sqldf() Our query will be sqldf('SELECT * FROM df1 EXCEPT SELECT * FROM df2'). It will exclude all the rows from df1 that are also present in df2 and will return only rows that are only present in df1. Example 1: R.


3 Answers

Use the complement operator ~

hsp.loc[~(hsp['Type_old'] == hsp['Type_new'])]

which gives:

   id Type_old Type_new  Len_old  Len_new
1   2      Num     Char       12       12
2   3     Char      Num       10        8

When dealing with Boolean operations, the complement operator is a handy way to invert True with False

like image 175
VinceP Avatar answered Sep 30 '22 12:09

VinceP


Ways to be confused by == versus != when comparing pd.Series

As expected

df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)

   Len_old  Len_new     NE
0       15       15  False
1       12       12  False
2       10        8   True
3        4        5   True
4        9       10   True

But if one of the column's values were strings!

df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)

   Len_old  Len_new    NE
0       15       15  True
1       12       12  True
2       10        8  True
3        4        5  True
4        9       10  True

Make sure both are the same types.

like image 28
piRSquared Avatar answered Sep 30 '22 12:09

piRSquared


Your code, as piRSquared said, had an issue with types.

Besides that, you could use comparing methods, in this case pd.Series.ne

Using your data:

hsp.loc[hsp['Type_old'].ne(hsp['Type_new'])]

But again, as piRSquared mentioned, because of dtypes it didn't work. Just in case, you have to take care about NaN/None values at your data... such:

hsp.loc[ ( hsp['Type_old'].ne(hsp['Type_new']) ) && (hsp['Type_old'].notna())]

In this case, .ne has another argument, fill_value, which fill missing data.


In addition, you could use "compare" method to show difference between two series (or DataFrames)

hsp.Len_old.compare(hsp.Len_new)

And it might return (if columns were of the same dtype):

   self  other
2  10.0    8.0
3   4.0    5.0
4   9.0   10.0

But just force to have another dtype:

hsp.Len_old.compare(hsp.Len_new.astype('str')) # string type new column

It will return all rows:

   self other
0   15  15
1   12  12
2   10  8
3   4   5
4   9   10
like image 3
JonathanLoscalzo Avatar answered Sep 30 '22 13:09

JonathanLoscalzo