Using the Pandas package in python, I would like to sum (marginalize) over one level in a series with a 3-level multiindex to produce a series with a 2 level multiindex. For example, if I have the following:
ind = [tuple(x) for x in ['ABC', 'ABc', 'AbC', 'Abc', 'aBC', 'aBc', 'abC', 'abc']] mi = pd.MultiIndex.from_tuples(ind) data = pd.Series([264, 13, 29, 8, 152, 7, 15, 1], index=mi) A B C 264 c 13 b C 29 c 8 a B C 152 c 7 b C 15 c 1
I would like to sum over the variable C to produce the following output:
A B 277 b 37 a B 159 b 16
What is the best way in Pandas to do this?
To sum all the rows of a DataFrame, use the sum() function and set the axis value as 1. The value axis 1 will add the row values.
sum() method is used to get the sum of the values for the requested axis. level[int or level name, default None] : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.
sum() to Sum All Columns. Use DataFrame. sum() to get sum/total of a DataFrame for both rows and columns, to get the total sum of columns use axis=1 param. By default, this method takes axis=0 which means summing of rows.
If you know you always want to aggregate over the first two levels, then this is pretty easy:
In [27]: data.groupby(level=[0, 1]).sum() Out[27]: A B 277 b 37 a B 159 b 16 dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With