Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas groupby dropping columns

I'm doing a simple group by operation, trying to compare group means. As you can see below, I have selected specific columns from a larger dataframe, from which all missing values have been removed.

selected columns and df head

But when I group by, I am losing a couple of columns:

group-by logic and resulting df

I have never encountered this with pandas, and I'm not finding anything else on stack overflow that is all that similar. Does anybody have any insight?

like image 706
user3334415 Avatar asked Jun 01 '16 18:06

user3334415


1 Answers

I think it is Automatic exclusion of 'nuisance' columns, what described here.

Sample:

df = pd.DataFrame({'C': {0: -0.91985400000000006, 1: -0.042379, 2: 1.2476419999999999, 3: -0.00992, 4: 0.290213, 5: 0.49576700000000001, 6: 0.36294899999999997, 7: 1.548106}, 'A': {0: 'foo', 1: 'bar', 2: 'foo', 3: 'bar', 4: 'foo', 5: 'bar', 6: 'foo', 7: 'foo'}, 'B': {0: 'one', 1: 'one', 2: 'two', 3: 'three', 4: 'two', 5: 'two', 6: 'one', 7: 'three'}, 'D': {0: -1.131345, 1: -0.089328999999999992, 2: 0.33786300000000002, 3: -0.94586700000000001, 4: -0.93213199999999996, 5: 1.9560299999999999, 6: 0.017587000000000002, 7: -0.016691999999999999}}) 
print (df)      A      B         C         D 0  foo    one -0.919854 -1.131345 1  bar    one -0.042379 -0.089329 2  foo    two  1.247642  0.337863 3  bar  three -0.009920 -0.945867 4  foo    two  0.290213 -0.932132 5  bar    two  0.495767  1.956030 6  foo    one  0.362949  0.017587 7  foo  three  1.548106 -0.016692  print( df.groupby('A').mean())             C         D A                       bar  0.147823  0.306945 foo  0.505811 -0.344944 

I think you can check DataFrame.dtypes.

like image 108
jezrael Avatar answered Sep 20 '22 23:09

jezrael