Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partition pandas .diff() in multi-index level

My question relates to calling .diff() within the partition of a multi index level

In the following sample the output of the first

df.diff() is

               values
Greek English        
alpha a           NaN
      b             2
      c             2
      d             2
beta  e            11
      f             1
      g             1
      h             1

But I want it to be:

               values
Greek English        
alpha a           NaN
      b             2
      c             2
      d             2
beta  e            NaN
      f             1
      g             1
      h             1

Here is a solution, using a loop but I am thinking I can avoid that loop!

import pandas as pd
import numpy as np

df = pd.DataFrame({'values' : [1.,3.,5.,7.,18.,19.,20.,21.],
   'Greek' : ['alpha', 'alpha', 'alpha', 'alpha','beta','beta','beta','beta'],
   'English' : ['a', 'b', 'c', 'd','e','f','g','h']})

df.set_index(['Greek','English'],inplace =True)
print df

# (1.) This is not the type of .diff() i want.
# I need it to respect the level='Greek' and restart   
print df.diff()


# this is one way to achieve my desired result but i have to think
# there is a way that does not involve the need to loop.
idx = pd.IndexSlice
for greek_letter in df.index.get_level_values('Greek').unique():
    df.loc[idx[greek_letter,:]]['values'] = df.loc[idx[greek_letter,:]].diff()

print df
like image 360
Dickster Avatar asked Apr 29 '15 13:04

Dickster


People also ask

How do you slice multiple index in pandas?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

How does pandas handle multiple index columns?

pandas MultiIndex to Columns Use pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero. Yields below output.

What does diff function do in pandas?

The diff() method returns a DataFrame with the difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.


1 Answers

Just groupby by level=0 or 'Greek' if you prefer and then you can call diff on values:

In [179]:

df.groupby(level=0)['values'].diff()
Out[179]:
Greek  English
alpha  a         NaN
       b           2
       c           2
       d           2
beta   e         NaN
       f           1
       g           1
       h           1
dtype: float64
like image 57
EdChum Avatar answered Sep 21 '22 19:09

EdChum