Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the column names of a DataFrame GroupBy object?

Tags:

python

pandas

How can I get the column names of a GroupBy object? The object does not supply a columns propertiy. I can aggregate the object first or extract a DataFrame with the get_group()-method but this is either a power consuming hack or error prone if there are dismissed columns (strings for example).

like image 448
Fookatchu Avatar asked Sep 02 '16 09:09

Fookatchu


People also ask

How do I refer to column names in pandas?

You can refer to variables in the environment by prefixing them with an '@' character like @a + b . You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword.

How do I change the column name after Groupby in pandas?

One way of renaming the columns in a Pandas Dataframe is by using the rename() function.

How do you find the columns of a data frame?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.


2 Answers

Looking at the source code of __getitem__, it seems that you can get the column names with

g.obj.columns

where g is the groupby object. Apparently g.obj links to the DataFrame.

like image 179
ayhan Avatar answered Oct 19 '22 03:10

ayhan


As Ayhan said, g.obj.columns does return columns, but of the dataframe. The group object columns returned by g.any().columns is not the same.

Specifically, g.any().columns does NOT include the columns used to create the groupby whereas g.obj.columns does.

So it depends on your use model for the result if this difference concerns you. In my case, I can be a bit less pedantic, but for a distributable piece of code, you may want to be precise.

In [109]: ww.grp.any().columns
Out[109]: 
Index(['inode', 'size', 'drvid', 'path', 'hash', 'ftype', 'id', 'md5',
       'parent', 'top'],
      dtype='object')

In [110]: ww.grp.any().index.name
Out[110]: 'file'

In [111]: ww.grp.obj.columns
Out[111]: 
Index(['inode', 'size', 'drvid', 'path', 'hash', 'ftype', 'file', 'id', 'md5',
       'parent', 'top'],
      dtype='object')
like image 44
JohnT Avatar answered Oct 19 '22 01:10

JohnT