Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

After groupby, how to flatten column headers?

I'm trying to left join multiple pandas dataframes on a single Id column, but when I attempt the merge I get warning:

KeyError: 'Id'.

I think it might be because my dataframes have offset columns resulting from a groupby statement, but I could very well be wrong. Either way I can't figure out how to "unstack" my dataframe column headers. None of the answers at this question seem to work.

My groupby code:

step1 = pd.DataFrame(step3.groupby(['Id', 'interestingtabsplittest2__grp'])['applications'].sum()) step1.sort('applications', ascending=False).head(3) 

Returns:

offset headers

How to get those offset headers into the top level?

like image 840
samthebrand Avatar asked Oct 08 '15 00:10

samthebrand


People also ask

How do I flatten a Dataframe after GroupBy?

Using reset_index() function Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python.

How do I unstack GroupBy?

The first index will have the column name and the second index will have the name of the aggregated function. Now, use stack() at level 0 of the grouped dataframe and unstack() the grouped dataframe. Then, use stack() at level 1 of the grouped dataframe and unstack() the grouped dataframe.

How do I flatten a column in pandas?

Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.

How do you flatten a data frame?

The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.


1 Answers

You're looking for .reset_index().

In [11]: df = pd.DataFrame([[2, 3], [5, 6]], pd.Index([1, 4], name="A"), columns=["B", "C"])  In [12]: df Out[12]:    B  C A 1  2  3 4  5  6  In [13]: df.reset_index() Out[13]:    A  B  C 0  1  2  3 1  4  5  6 

Note: That you can avoid this step by using as_index=False when doing the groupby.

step1 = step3.groupby(['Id', 'interestingtabsplittest2__grp'], as_index=False)['applications'].sum() 
like image 152
Andy Hayden Avatar answered Oct 01 '22 15:10

Andy Hayden