Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing hierarchical columns in pandas after groupby

I use pandas for grouping a dataset. When I aggregate different columns with different functions I'm getting a hierarchical column-structure.

G1 = df.groupby('date').agg({'col1': [sum, np.mean], 'col2': 'sum', 'col3': np.mean})

results in:

            col1               col2       col3
               sum      mean      sum       mean
date
2000-11-01    1701  1.384052    82336  54.222945
2000-11-02   11101  1.447894   761963  70.027260
2000-11-03   11285  1.479418   823355  77.984268

I couldn't find too much about this resulting structure in the docs unfortunately. The only thing I found in pandas docs was the hierarchical multi-index.

How can I access the values? Currently I do: X['col1']['mean'] to access the whole Series

2000-11-01   1.384052   
2000-11-02   1.447894  
2000-11-03   1.479418  

and thus X['col1']['mean'][1] to get the value 1.447894, but I wonder about the performance, because this code first slices col1 (X['col1']) which results in a view/copy (dunno which one in this case) containing actually 2 columns, and then there is yet another slice of the mean-column.

Any tips? And where can I find more about the creation of the hierarchical columns in the docs?

like image 888
tim Avatar asked Jun 12 '14 08:06

tim


People also ask

How do you sort values after Groupby?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

How do I flatten a DataFrame after Groupby?

Using reset_index() function Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python.

How do I index a Groupby DataFrame?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

Does Groupby preserve order?

Groupby preserves the order of rows within each group.


1 Answers

The advice is to do these in one pass (without chaining), this especially allows you to do assignment (rather than assigning to a view and the modification being garbage collected).

Access a MultiIndex* column as a tuple:

In [11]: df[('col1', 'mean')]
Out[11]:
date
2000-11-01    1.384052
2000-11-02    1.447894
2000-11-03    1.479418
Name: (col1, mean), dtype: float64

and a specific value using loc:

In [12]: df.loc['2000-11-01', ('col1', 'mean')]
Out[12]: 1.3840520000000001

(To mix labels, loc, and position, iloc, you have to use ix)

In [13]: df.ix[0, ('col1', 'mean')]
Out[13]: 1.3840520000000001

*This is a MultiIndex.

like image 77
Andy Hayden Avatar answered Sep 24 '22 00:09

Andy Hayden