Remove last two characters from column names of all the columns in Dataframe - Pandas

Question

I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get created. The following is the command I used,

a.join(b,how='inner', on='userId',lsuffix="_1")

If I dont use this suffix, I am getting error. But I dont want the column names to change because, that is causing a problem while running other analysis. So I want to remove this "_1" character from all the column names of the resulting dataframe. Can anybody suggest me an efficient way to remove last two characters of names of all the columns in the Pandas dataframe?

Thanks

Thtu · Accepted Answer

This snippet should get the job done :

df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))

Edit : This is a better way to do it

df.rename(columns = lambda x : str(x)[:-2])

In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.

I'm sure there are a few other ways you could do this.

aydow · Answer

You could use str.rstrip like so

In [214]: import functools as ft

In [215]: f = ft.partial(np.random.choice, *[5, 3])

In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})

In [226]: df
Out[226]:
   a  b  c  a_1  b_1  c_1
0  4  2  0    2    3    2
1  0  0  3    2    1    1
2  4  0  4    4    4    3

In [227]: df.columns = df.columns.str.rstrip('_1')

In [228]: df
Out[228]:
   a  b  c  a  b  c
0  4  2  0  2  3  2
1  0  0  3  2  1  1
2  4  0  4  4  4  3

However if you need something more flexible (albeit probably a bit slower), you can use str.extract which, with the power of regexes, will allow you to select which part of the column name you would like to keep

In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i in range(3) for c in 'abc'})

In [217]: df
Out[217]:
   a_0  b_0  c_0  a_1  b_1  c_1  a_2  b_2  c_2
0    0    1    0    2    2    4    0    0    3
1    0    0    3    1    4    2    4    3    2
2    2    0    1    0    0    2    2    2    1

In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]

In [224]: df
Out[224]:
0  a  b  c  a  b  c  a  b  c
0  1  1  0  0  0  2  1  1  2
1  1  0  1  0  1  2  0  4  1
2  1  3  1  3  4  2  0  1  1

Idea to use df.columns.str came from this answer

Remove last two characters from column names of all the columns in Dataframe - Pandas

Tags:

python

string

pandas

dataframe

Observer

Video Answer

2 Answers

Thtu

aydow

Recent Activity

Donate For Us

Remove last two characters from column names of all the columns in Dataframe - Pandas

Tags:

python

string

pandas

dataframe

Observer

Video Answer

2 Answers

Thtu

aydow

Related questions

Recent Activity

Donate For Us