Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Multiindex Groupby on Columns

Is there anyway to use groupby on the columns in a Multiindex. I know you can on the rows and there is good documentation in that regard. However I cannot seem to groupby on columns. The only solution I have is transposing the dataframe.

#generate data (copied from pandas example)
arrays=[['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)

Now I will try to groupby columns which fails

df.groupby(level=1)
df.groupby(level='first')

However transposing with rows works

df.T.groupby(level=1)
df.T.groupby(level='first')

So is there a way to do this without transposing?

like image 524
Bobe Kryant Avatar asked Nov 22 '16 15:11

Bobe Kryant


People also ask

How can I group by Multiindex pandas?

You can use the following basic syntax to use GroupBy on a pandas DataFrame with a multiindex: #calculate sum by level 0 and 1 of multiindex df. groupby(level=[0,1]). sum() #calculate count by level 0 and 1 of multiindex df.

How do I unstack groupby?

The first index will have the column name and the second index will have the name of the aggregated function. Now, use stack() at level 0 of the grouped dataframe and unstack() the grouped dataframe. Then, use stack() at level 1 of the grouped dataframe and unstack() the grouped dataframe.

Can you group by multiple columns in pandas?

Grouping by Multiple ColumnsYou can do this by passing a list of column names to groupby instead of a single string value.

How do I Group A column in pandas?

You call .groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .groupby() as the first argument.


1 Answers

You need to specify the axis in the groupby method:

df.groupby(level = 1, axis = 1).sum()

enter image description here

Or if you mean groupby level 0:

df.groupby(level = 0, axis = 1).sum()

enter image description here

like image 50
Psidom Avatar answered Oct 21 '22 20:10

Psidom