Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Renaming index values in multiindex dataframe

Creating my dataframe:

from pandas import *
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = zip(*arrays)

index = MultiIndex.from_tuples(tuples, names=['first','second'])
data = DataFrame(randn(8,2),index=index,columns=['c1','c2'])

data
Out[68]: 
                    c1        c2
first second                    
bar   one     0.833816 -1.529639
      two     0.340150 -1.818052
baz   one    -1.605051 -0.917619
      two    -0.021386 -0.222951
foo   one     0.143949 -0.406376
      two     1.208358 -2.469746
qux   one    -0.345265 -0.505282
      two     0.158928  1.088826

I would like to rename the "first" index values, such as "bar"->"cat", "baz"->"dog", etc. However, every example I have read either operates on a single-level index and/or loops through the entire index to effectively re-create it from scratch. I was thinking something like:

data = data.reindex(index={'bar':'cat','baz':'dog'})

but this does not work, nor do I really expect it to work on multiple indexes. Can I do such a replacement without looping through the entire dataframe index?

Begin edit

I am hesitant to update to 0.13 until release, so I used the following workaround:

index = data.index.tolist()
for r in xrange( len(index) ):
    index[r] = (codes[index[r][0]],index[r][1])

index = pd.MultiIndex.from_tuples(index,names=data.index.names)
data.index = index

Where is a previous defined dictionary of code:string pairs. This actually isn't as big of a performance his as I was expecting (takes a couple seconds to operate over ~1.1 million rows). It is not as pretty as a one-liner, but it does work.

End Edit

like image 385
tnknepp Avatar asked Dec 11 '13 20:12

tnknepp


2 Answers

Use the set_levels method (new in version 0.13.0):

data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'], 
                       [u'one', u'two']], inplace=True)

yields

                    c1        c2
first second                    
cat   one    -0.289649 -0.870716
      two    -0.062014 -0.410274
dog   one     0.030171 -1.091150
      two     0.505408  1.531108
foo   one     1.375653 -1.377876
      two    -1.478615  1.351428
qux   one     1.075802  0.532416
      two     0.865931 -0.765292

To remap a level based on a dict, you could use a function such as this:

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)

dct = {'bar':'cat', 'baz':'dog'}
map_level(data, dct, level=0)

Here's a runnable example:

import numpy as np
import pandas as pd

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first','second'])
data = pd.DataFrame(np.random.randn(8,2),index=index,columns=['c1','c2'])
data2 = data.copy()

data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'], 
                       [u'one', u'two']], inplace=True)
print(data)
#                     c1        c2
# first second                    
# cat   one     0.939040 -0.748100
#       two    -0.497006 -1.185966
# dog   one    -0.368161  0.050339
#       two    -2.356879 -0.291206
# foo   one    -0.556261  0.474297
#       two     0.647973  0.755983
# qux   one    -0.017722  1.364244
#       two     1.007303  0.004337

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'wolf', 'baz':'rabbit'}
map_level(data2, dct, level=0)
print(data2)
#                      c1        c2
# first  second                    
# wolf   one     0.939040 -0.748100
#        two    -0.497006 -1.185966
# rabbit one    -0.368161  0.050339
#        two    -2.356879 -0.291206
# foo    one    -0.556261  0.474297
#        two     0.647973  0.755983
# qux    one    -0.017722  1.364244
#        two     1.007303  0.004337
like image 61
unutbu Avatar answered Sep 21 '22 23:09

unutbu


The set_levels method was causing my new column names to be out of order. So I found a different solution that isn't very clean, but works well. The method is to print df.index (or equivalently df.columns) and then copy and paste the output with the desired values changed. For example:

print data.index

MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['one', 'two']], labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]], names=['first', 'second'])

data.index = MultiIndex(levels=[['new_bar', 'new_baz', 'new_foo', 'new_qux'],
                                ['new_one', 'new_two']],
                        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]],
                        names=['first', 'second'])

We can have full control over names by editing the labels as well. For example:

data.index = MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'],
                                ['one', 'twooo', 'three', 'four',
                                 'five', 'siz', 'seven', 'eit']],
                        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 3, 4, 5, 6, 7]],
                        names=['first', 'second'])

Note that in this example we have already done something like from pandas import MultiIndex or from pandas import *.

like image 24
AlexG Avatar answered Sep 23 '22 23:09

AlexG