I (think I) know how to check if a value is contained in the index of a pandas Series, but I can't get it to work in the example below. Is it a bug perhaps?
First, I generate some random numbers:
import numpy as np
import pandas as pd
some_numbers = np.random.randint(0,4,size=10)
print(some_numbers)
Output:
[0 2 2 3 1 1 2 2 3 2]
Then, I create a Series with those numbers and compute their frequency
s = pd.Series(some_numbers)
gb = s.groupby(s).size() / len(s)
print(gb)
Output:
0 0.1
1 0.2
2 0.5
3 0.2
dtype: float64
So far, so good. But I do not understand the output of the next line of code:
1.3 in gb
Output:
True
Shouldn't the output be False? (I have pandas 0.20.3 on Python 3.6.2)
I know that I could use
1.3 in list(gb.index)
but this is not very efficient if the Series is large.
import pandas as pd
s = pd.Series([.1,.2,.3])
print(s)
0 0.1
1 0.2
2 0.3
dtype: float64
3.4 in s
False
but, wait for it...
s = pd.Series([.1,.2,.3,.4])
print(s)
0 0.1
1 0.2
2 0.3
3 0.4
dtype: float64
3.4 in s
True
I believe that the issue is that gb.index
is an int64
index:
>>> gb.index
Int64Index([0, 1, 2, 3], dtype='int64')
>>> type(gb.index)
<class 'pandas.core.indexes.numeric.Int64Index'>
and so when doing your comparison to 1.3
, that value is being converted to an int. Some evidence for this is that values up to 3.99999
will return True
, because converting that to int
gives you 3
, however, 4.000001 in gb.index
returns False
because converting 4.000001
to int
returns 4
(which is not in gb.index
)
If you force it to a float index, you end up getting false, because 1.3
is not in Float64Index([0.0, 1.0, 2.0, 3.0], dtype='float64')
:
>>> 1.3 in gb.index.astype('float')
False
tested in pandas '0.21.1'
, python 3.6.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With