I am finding a very strange (IMHO) behaviour with some data loaded into pandas from a CSV file. To protect the innocent, let's state that the DataFrame
is in the variable homes
and, among others, has the columns below:
In [143]: homes[['zipcode', 'sqft', 'price']].dtypes
Out[143]:
zipcode int64
sqft int64
price int64
dtype: object
To get the average price in each zipcode, I tried:
In [146]: homes.groupby('zipcode')[['price']].mean().head(n=5)
Out[146]:
price
zipcode
28001 280804
28002 234284
28003 294111
28004 1355927
28005 810164
Strangely enough, the price mean is an int64 as shown by:
In [147]: homes.groupby('zipcode')[['price']].mean().dtypes
Out[147]:
price int64
dtype: object
I am not able to imagine any technical reason why the mean of some ints is not promoted to float. Even more, just adding another column, makes the price to become a float64 as I expected it to be all the time:
In [148]: homes.groupby('zipcode')[['price', 'sqft']].mean().dtypes
Out[148]:
price float64
sqft float64
dtype: object
price sqft
zipcode
28001 280804.690608 14937.450276
28002 234284.035176 7517.633166
28003 294111.278571 10603.096429
28004 1355927.097792 13104.220820
28005 810164.880952 19928.785714
To ensure I was not missing something very obvious, I created another very simple DataFrame
(df
) but, with this one, this behaviour is not appearing:
In [161]: df[['J','K']].dtypes
Out[161]:
J int64
K int64
dtype: object
In [164]: df[['J','K']].head(n=10)
Out[164]:
J K
0 0 -9
1 0 -14
2 0 8
3 0 -11
4 0 -7
5 -1 7
6 0 2
7 0 0
8 0 5
9 0 3
In [165]: df.groupby('J')[['K']].mean()
Out[165]:
K
J
-2 -2.333333
-1 0.466667
0 -1.030303
1 -1.750000
2 -3.000000
Please, note that with a single column, K:int64, grouped by J, another int64, the mean is directly a float. The homes
DataFrame
was read from
a supplied CSV file, the df
one has been created in pandas, written into a CSV and then read back.
Last but not least, I am using pandas 0.16.2.
As suggested by some of you in the comments, this is a bug in pandas. I have just reported it here.
As of now, it has been accepted by the pandas team.
Thanks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With