I use pandas for grouping a dataset. When I aggregate different columns with different functions I'm getting a hierarchical column-structure. <pre class="prettyprint"><code>G1 = df.groupby('date').agg({'col1': [sum, np.mean], 'col2': 'sum', 'col3': np.mean}) </code></pre> results in: <pre class="prettyprint"><code> col1 col2 col3 sum mean sum mean date 2000-11-01 1701 1.384052 82336 54.222945 2000-11-02 11101 1.447894 761963 70.027260 2000-11-03 11285 1.479418 823355 77.984268 </code></pre> I couldn't find too much about this resulting structure in the docs unfortunately. The only thing I found in pandas docs was the hierarchical multi-index. How can I access the values? Currently I do: <code>X['col1']['mean']</code> to access the whole <code>Series</code> <pre class="prettyprint"><code>2000-11-01 1.384052 2000-11-02 1.447894 2000-11-03 1.479418 </code></pre> and thus <code>X['col1']['mean'][1]</code> to get the value <code>1.447894</code>, but I wonder about the performance, because this code first slices <code>col1</code> (X['col1']) which results in a view/copy (dunno which one in this case) containing actually 2 columns, and then there is yet another slice of the <code>mean</code>-column. Any tips? And where can I find more about the creation of the hierarchical columns in the docs?

The advice is to do these in one pass (without chaining), this especially allows you to do assignment (rather than assigning to a view and the modification being garbage collected). Access a MultiIndex* column as a tuple: <pre class="prettyprint"><code>In [11]: df[('col1', 'mean')] Out[11]: date 2000-11-01 1.384052 2000-11-02 1.447894 2000-11-03 1.479418 Name: (col1, mean), dtype: float64 </code></pre> and a specific value using loc: <pre class="prettyprint"><code>In [12]: df.loc['2000-11-01', ('col1', 'mean')] Out[12]: 1.3840520000000001 </code></pre> (To mix labels, loc, and position, iloc, you have to use ix) <pre class="prettyprint"><code>In [13]: df.ix[0, ('col1', 'mean')] Out[13]: 1.3840520000000001 </code></pre> *This is a MultiIndex.

Accessing hierarchical columns in pandas after groupby

I use pandas for grouping a dataset. When I aggregate different columns with different functions I'm getting a hierarchical column-structure.

G1 = df.groupby('date').agg({'col1': [sum, np.mean], 'col2': 'sum', 'col3': np.mean})

results in:

            col1               col2       col3
               sum      mean      sum       mean
date
2000-11-01    1701  1.384052    82336  54.222945
2000-11-02   11101  1.447894   761963  70.027260
2000-11-03   11285  1.479418   823355  77.984268

I couldn't find too much about this resulting structure in the docs unfortunately. The only thing I found in pandas docs was the hierarchical multi-index.

How can I access the values? Currently I do: X['col1']['mean'] to access the whole Series

2000-11-01   1.384052   
2000-11-02   1.447894  
2000-11-03   1.479418

and thus X['col1']['mean'][1] to get the value 1.447894, but I wonder about the performance, because this code first slices col1 (X['col1']) which results in a view/copy (dunno which one in this case) containing actually 2 columns, and then there is yet another slice of the mean-column.

Any tips? And where can I find more about the creation of the hierarchical columns in the docs?

How do you sort values after Groupby?

Sort Values in Descending Order with Groupby You can sort values in descending order by using ascending=False param to sort_values() method. The head() function is used to get the first n rows. It is useful for quickly testing if your object has the right type of data in it.

How do I flatten a DataFrame after Groupby?

Using reset_index() function Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python.

How do I index a Groupby DataFrame?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

Does Groupby preserve order?

Groupby preserves the order of rows within each group.

The advice is to do these in one pass (without chaining), this especially allows you to do assignment (rather than assigning to a view and the modification being garbage collected).

Access a MultiIndex* column as a tuple:

In [11]: df[('col1', 'mean')]
Out[11]:
date
2000-11-01    1.384052
2000-11-02    1.447894
2000-11-03    1.479418
Name: (col1, mean), dtype: float64

and a specific value using loc:

In [12]: df.loc['2000-11-01', ('col1', 'mean')]
Out[12]: 1.3840520000000001

(To mix labels, loc, and position, iloc, you have to use ix)

In [13]: df.ix[0, ('col1', 'mean')]
Out[13]: 1.3840520000000001

*This is a MultiIndex.

Accessing hierarchical columns in pandas after groupby

Tags:

python

indexing

pandas

group-by

hierarchical-data

tim

People also ask

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Accessing hierarchical columns in pandas after groupby

Tags:

python

indexing

pandas

group-by

hierarchical-data

tim

People also ask

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us