I have a pandas dataframe whose indices look like:
df.index
['a_1', 'b_2', 'c_3', ... ]
I want to rename these indices to:
['a', 'b', 'c', ... ]
How do I do this without specifying a dictionary with explicit keys for each index value?
I tried:
df.rename( index = lambda x: x.split( '_' )[0] )
but this throws up an error:
AssertionError: New axis must be unique to rename
Create data Provide index of the column to be renamed as argument to rename () function. Pandas rename () method is used to rename any index, column or row. Syntax: rename (mapper=None, index=None, columns=None, axis=None, copy=True, inplace=False, level=None)
The rename() method allows you to change the row indexes, and the columns labels. Syntax dataframe .rename( mapper , index, columns, axis, copy, inplace, level, errors)
Let's change the row index level name to 'names'. The rename_axis method also has the ability to change the column level names by changing the axis parameter: If you set the index with some of the columns, then the column name will become the new index level name. Let's append to index levels to our original DataFrame:
Currently the DataFrame has no index name: Note that inplace=True tells pandas to retain all of the original DataFrame properties.
Perhaps you could get the best of both worlds by using a MultiIndex:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.arange(8).reshape(4,2), index=['a_1', 'b_2', 'c_3', 'c_4'])
print(df)
# 0 1
# a_1 0 1
# b_2 2 3
# c_3 4 5
# c_4 6 7
index = pd.MultiIndex.from_tuples([item.split('_') for item in df.index])
df.index = index
print(df)
# 0 1
# a 1 0 1
# b 2 2 3
# c 3 4 5
# 4 6 7
This way, you can access things according to first level of the index:
In [30]: df.ix['c']
Out[30]:
0 1
3 4 5
4 6 7
or according to both levels of the index:
In [31]: df.ix[('c','3')]
Out[31]:
0 4
1 5
Name: (c, 3)
Moreover, all the DataFrame methods are built to work with DataFrames with MultiIndices, so you lose nothing.
However, if you really want to drop the second level of the index, you could do this:
df.reset_index(level=1, drop=True, inplace=True)
print(df)
# 0 1
# a 0 1
# b 2 3
# c 4 5
# c 6 7
That's the error you'd get if your function produced duplicate index values:
>>> df = pd.DataFrame(np.random.random((4,3)),index="a_1 b_2 c_3 c_4".split())
>>> df
0 1 2
a_1 0.854839 0.830317 0.046283
b_2 0.433805 0.629118 0.702179
c_3 0.390390 0.374232 0.040998
c_4 0.667013 0.368870 0.637276
>>> df.rename(index=lambda x: x.split("_")[0])
[...]
AssertionError: New axis must be unique to rename
If you really want that, I'd use a list comp:
>>> df.index = [x.split("_")[0] for x in df.index]
>>> df
0 1 2
a 0.854839 0.830317 0.046283
b 0.433805 0.629118 0.702179
c 0.390390 0.374232 0.040998
c 0.667013 0.368870 0.637276
but I'd think about whether that's really the right direction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With