Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rename index of a pandas dataframe

Tags:

python

pandas

I have a pandas dataframe whose indices look like:

df.index
['a_1', 'b_2', 'c_3', ... ]

I want to rename these indices to:

['a', 'b', 'c', ... ]

How do I do this without specifying a dictionary with explicit keys for each index value?
I tried:

df.rename( index = lambda x: x.split( '_' )[0] )

but this throws up an error:

AssertionError: New axis must be unique to rename
like image 610
user1486457 Avatar asked May 16 '13 15:05

user1486457


People also ask

How do I rename an index in a column in pandas?

Create data Provide index of the column to be renamed as argument to rename () function. Pandas rename () method is used to rename any index, column or row. Syntax: rename (mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None)

What is the use of rename in Dataframe?

The rename() method allows you to change the row indexes, and the columns labels. Syntax dataframe .rename( mapper , index, columns, axis, copy, inplace, level, errors)

How to change the row index level name in a Dataframe?

Let's change the row index level name to 'names'. The rename_axis method also has the ability to change the column level names by changing the axis parameter: If you set the index with some of the columns, then the column name will become the new index level name. Let's append to index levels to our original DataFrame:

What is the index name of a Dataframe in pandas?

Currently the DataFrame has no index name: Note that inplace=True tells pandas to retain all of the original DataFrame properties.


2 Answers

Perhaps you could get the best of both worlds by using a MultiIndex:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
#      0  1
# a_1  0  1
# b_2  2  3
# c_3  4  5
# c_4  6  7

index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
#      0  1
# a 1  0  1
# b 2  2  3
# c 3  4  5
#   4  6  7

This way, you can access things according to first level of the index:

In [30]: df.ix['c']
Out[30]: 
   0  1
3  4  5
4  6  7

or according to both levels of the index:

In [31]: df.ix[('c','3')]
Out[31]: 
0    4
1    5
Name: (c, 3)

Moreover, all the DataFrame methods are built to work with DataFrames with MultiIndices, so you lose nothing.

However, if you really want to drop the second level of the index, you could do this:

df.reset_index(level=1, drop=True, inplace=True)
print(df)
#    0  1
# a  0  1
# b  2  3
# c  4  5
# c  6  7
like image 162
unutbu Avatar answered Sep 21 '22 01:09

unutbu


That's the error you'd get if your function produced duplicate index values:

>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
            0         1         2
a_1  0.854839  0.830317  0.046283
b_2  0.433805  0.629118  0.702179
c_3  0.390390  0.374232  0.040998
c_4  0.667013  0.368870  0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename

If you really want that, I'd use a list comp:

>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
          0         1         2
a  0.854839  0.830317  0.046283
b  0.433805  0.629118  0.702179
c  0.390390  0.374232  0.040998
c  0.667013  0.368870  0.637276

but I'd think about whether that's really the right direction.

like image 39
DSM Avatar answered Sep 24 '22 01:09

DSM