Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, Computing total sum on each MultiIndex sublevel

Tags:

python

pandas

I would like to compute the total sum on each multi-index sublevel. And then, save it in the dataframe.

My current dataframe looks like:

                    values
    first second
    bar   one     0.106521
          two     1.964873
    baz   one     1.289683
          two    -0.696361
    foo   one    -0.309505
          two     2.890406
    qux   one    -0.758369
          two     1.302628

And the needed result is:

                    values
    first second
    bar   one     0.106521
          two     1.964873
          total   2.071394
    baz   one     1.289683
          two    -0.696361
          total   0.593322
    foo   one    -0.309505
          two     2.890406
          total   2.580901
    qux   one    -0.758369
          two     1.302628
          total   0.544259
    total one     0.328331
          two     5.461546
          total   5.789877

Currently I found the folowing implementation that works. But I would like to know if there are better options. I need the fastest solution possible, because in some cases when my dataframes become huge, the computation time seems to take ages.

In [1]: arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
   ...:           ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
   ...: 

In [2]: tuples = list(zip(*arrays))

In [3]: index = MultiIndex.from_tuples(tuples, names=['first', 'second'])

In [4]: s = Series(randn(8), index=index)

In [5]: d = {'values': s}

In [6]: df = DataFrame(d)

In [7]: for col in df.index.names:
   .....:     df = df.unstack(col)
   .....:     df[('values', 'total')] = df.sum(axis=1)
   .....:     df = df.stack()
   .....:
like image 302
Iulian Stana Avatar asked Apr 02 '15 12:04

Iulian Stana


1 Answers

Not sure if you are still looking for an answer to this - you could try something like this, assuming your current dataframe is assigned to df :

temp = df.pivot(index='first', columns='second', values='values')
temp['total'] = temp['one'] + temp['two']
temp.stack()
like image 159
Sajan Avatar answered Oct 08 '22 09:10

Sajan