I have a weird data set:
   year   firms  age  survival
0  1977  564918    0       NaN
2  1978  503991    0       NaN
3  1978  413130    1  0.731310
5  1979  497805    0       NaN
6  1979  390352    1  0.774522
where I have cast the dtype of the first three columns to be integer:
>>> df.dtypes
year          int64
firms         int64
age           int64
survival    float64
But now I want to search in another table based on an index here:
idx = 331
otherDf.loc[df.loc[idx, 'age']]
Traceback (most recent call last):
(...)
KeyError: 8.0
This comes from
df.loc[idx, 'age']
8.0
Why does this keep returning a float value? And how can I perform the lookup in otherDf? I'm in pandas version 0.15.
You get back a float because each row contains a mix of float and int types. Upon selecting a row index with loc, integers are cast to floats:
>>> df.loc[4]
year          1979.000000
firms       390352.000000
age              1.000000
survival         0.774522
Name: 4, dtype: float64
So choosing the age entry here with df.loc[4, 'age'] would yield 1.0.
To get around this and return an integer, you could use loc to select from just the age column and not the whole DataFrame:
>>> df['age'].loc[4]
1
                        This was a bug in pandas up through version 0.19. It seems to have been fixed in version 0.20. cf. https://github.com/pandas-dev/pandas/issues/11617
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With