Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dtype: integer, but loc returns float

I have a weird data set:

   year   firms  age  survival
0  1977  564918    0       NaN
2  1978  503991    0       NaN
3  1978  413130    1  0.731310
5  1979  497805    0       NaN
6  1979  390352    1  0.774522

where I have cast the dtype of the first three columns to be integer:

>>> df.dtypes
year          int64
firms         int64
age           int64
survival    float64

But now I want to search in another table based on an index here:

idx = 331
otherDf.loc[df.loc[idx, 'age']]
Traceback (most recent call last):
(...)
KeyError: 8.0

This comes from

df.loc[idx, 'age']
8.0

Why does this keep returning a float value? And how can I perform the lookup in otherDf? I'm in pandas version 0.15.

like image 525
FooBar Avatar asked Feb 11 '15 17:02

FooBar


2 Answers

You get back a float because each row contains a mix of float and int types. Upon selecting a row index with loc, integers are cast to floats:

>>> df.loc[4]
year          1979.000000
firms       390352.000000
age              1.000000
survival         0.774522
Name: 4, dtype: float64

So choosing the age entry here with df.loc[4, 'age'] would yield 1.0.

To get around this and return an integer, you could use loc to select from just the age column and not the whole DataFrame:

>>> df['age'].loc[4]
1
like image 131
Alex Riley Avatar answered Nov 03 '22 17:11

Alex Riley


This was a bug in pandas up through version 0.19. It seems to have been fixed in version 0.20. cf. https://github.com/pandas-dev/pandas/issues/11617

like image 2
Mike Jarvis Avatar answered Nov 03 '22 17:11

Mike Jarvis