Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas, why does the groupby 'key' column disappear in this scenario

Tags:

python

pandas

I have the following code...which for some reason results in the 'key' column disappearing. I have also noticed other times when the key column seems to 'randomly' disappear. I am trying to isolate the cases, this is one.

I am usinng pandas version 0.20.1

DF = pd.DataFrame([['a', 1], ['b', 2], ['b', 3]], columns = ['G', 'N'])
groupByObj = DF.groupby('G')
print groupByObj.get_group('b')
groupByObj.sum()
print groupByObj.get_group('b')

The first print groupByObj.get_group('b') results in:

   G  N
1  b  2
2  b  3

The second print groupByObj.get_group('b') results in:

   N
1  2
2  3

Why does the 'key' column ('G') disappear after running groupByObj.sum()

like image 938
BrainPermafrost Avatar asked Aug 08 '17 19:08

BrainPermafrost


1 Answers

This is a bug in Pandas, discussed in:

  • https://github.com/pandas-dev/pandas/issues/12839
  • https://github.com/pandas-dev/pandas/issues/14013

The latter is still open.

From reading a bit in GitHub, and as mentioned in the comments, it seems that the second output is the wanted behavior, and was obtained in the sum case by adding the following line to pandas.core.groupby._GroupBy#_set_group_selection:

self._reset_cache('_selected_obj')

Since this reset happens when calling sum (and a few other functions), this G column is still visible on the first get_group call. BTW - the reset isn't performed also when calling mean, and a few other functions as well. It seems that this bug is a bit more comprehensive than thought, and was not solved by the simple cache reset.

like image 182
Shovalt Avatar answered Oct 27 '22 17:10

Shovalt