I am using the python shift function to compare if a value in a Series is equal to the previus value. Basically
import pandas as pd
a = pd.Series([2, 2, 4, 5])
a == a.shift()
Out[1]:
0 False
1 True
2 False
3 False
dtype: bool
This is as expected. (The first comparison is False because we are comparing with the NA
of the shifted series). Now, I do have Series where I don't have any value, ie. None
, like this
b = pd.Series([None, None, 4, 5])
Here the comparison of the two None
s gives False
b == b.shift()
Out[3]:
0 False
1 False
2 False
3 False
dtype: bool
I'd be willing to accept some sort of philosophical reasoning arguing that comparing None
is meaningless etc., however
c = None
d = None
c == d
Out[4]: True
What is going on here?!
And, what I really want to know is; how can I perform my comparison of my b
-Series, given that I want it to treat None
's as equal? That is I want b == b.shift()
to give the same result as a == a.shift()
gave.
The None
get casted to NaN
and NaN
has the property that it is not equal to itself:
[54]:
b = pd.Series([None, None, 4, 5])
b
Out[54]:
0 NaN
1 NaN
2 4.0
3 5.0
dtype: float64
As you can see here:
In[55]:
b==b
Out[55]:
0 False
1 False
2 True
3 True
dtype: bool
I'm not sure how you can get this to work correctly, although this works:
In[68]:
( (b == b.shift()) | ( (b != b.shift()) & (b != b) ) )
Out[68]:
0 True
1 True
2 False
3 False
dtype: bool
You'll get a false result for the first row because when you shift
down you're comparing against a non-existent row:
In[69]:
b.shift()
Out[69]:
0 NaN
1 NaN
2 NaN
3 4.0
dtype: float64
So the NaN
is comparing True
from the boolean logic as the first row is NaN
and so is the shifted series' first row.
To work around the first row False-positive you could slice the resultant result to ignore the first row:
In[70]:
( (b == b.shift()) | ( (b != b.shift()) & (b != b) ) )[1:]
Out[70]:
1 True
2 False
3 False
dtype: bool
As to why it gets casted, Pandas
tries to coerce the data to a compatible numpy, here float is selected because of the int
s and None
values, None
and NaN
cannot be represented by int
s
To get the same result as a
in your example, you should overwrite the first row to False
as it should always fail:
In[78]:
result = pd.Series( ( (b == b.shift()) | ( (b != b.shift()) & (b != b) ) ) )
result.iloc[0] = False
result
Out[78]:
0 False
1 True
2 False
3 False
dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With