I (think I) know how to check if a value is contained in the index of a pandas Series, but I can't get it to work in the example below. Is it a bug perhaps?
First, I generate some random numbers:
import numpy as np
import pandas as pd
some_numbers = np.random.randint(0,4,size=10)
print(some_numbers)
Output:
[0 2 2 3 1 1 2 2 3 2]
Then, I create a Series with those numbers and compute their frequency
s = pd.Series(some_numbers)
gb = s.groupby(s).size() / len(s)
print(gb)
Output:
0 0.1
1 0.2
2 0.5
3 0.2
dtype: float64
So far, so good. But I do not understand the output of the next line of code:
1.3 in gb
Output:
True
Shouldn't the output be False? (I have pandas 0.20.3 on Python 3.6.2)
I know that I could use
1.3 in list(gb.index)
but this is not very efficient if the Series is large.
import pandas as pd
s = pd.Series([.1,.2,.3])
print(s)
0 0.1
1 0.2
2 0.3
dtype: float64
3.4 in s
False
but, wait for it...
s = pd.Series([.1,.2,.3,.4])
print(s)
0 0.1
1 0.2
2 0.3
3 0.4
dtype: float64
3.4 in s
True
I believe that the issue is that gb.index is an int64 index:
>>> gb.index
Int64Index([0, 1, 2, 3], dtype='int64')
>>> type(gb.index)
<class 'pandas.core.indexes.numeric.Int64Index'>
and so when doing your comparison to 1.3, that value is being converted to an int. Some evidence for this is that values up to 3.99999 will return True, because converting that to int gives you 3, however, 4.000001 in gb.index returns False because converting 4.000001 to int returns 4 (which is not in gb.index)
If you force it to a float index, you end up getting false, because 1.3 is not in Float64Index([0.0, 1.0, 2.0, 3.0], dtype='float64'):
>>> 1.3 in gb.index.astype('float')
False
tested in pandas '0.21.1', python 3.6.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With