Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove last two characters from column names of all the columns in Dataframe - Pandas

I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get created. The following is the command I used,

a.join(b,how='inner', on='userId',lsuffix="_1")

If I dont use this suffix, I am getting error. But I dont want the column names to change because, that is causing a problem while running other analysis. So I want to remove this "_1" character from all the column names of the resulting dataframe. Can anybody suggest me an efficient way to remove last two characters of names of all the columns in the Pandas dataframe?

Thanks

like image 931
Observer Avatar asked May 05 '16 22:05

Observer


Video Answer


2 Answers

This snippet should get the job done :

df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))

Edit : This is a better way to do it

df.rename(columns = lambda x : str(x)[:-2])

In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.

I'm sure there are a few other ways you could do this.

like image 55
Thtu Avatar answered Sep 29 '22 00:09

Thtu


You could use str.rstrip like so

In [214]: import functools as ft

In [215]: f = ft.partial(np.random.choice, *[5, 3])

In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})

In [226]: df
Out[226]:
   a  b  c  a_1  b_1  c_1
0  4  2  0    2    3    2
1  0  0  3    2    1    1
2  4  0  4    4    4    3

In [227]: df.columns = df.columns.str.rstrip('_1')

In [228]: df
Out[228]:
   a  b  c  a  b  c
0  4  2  0  2  3  2
1  0  0  3  2  1  1
2  4  0  4  4  4  3

However if you need something more flexible (albeit probably a bit slower), you can use str.extract which, with the power of regexes, will allow you to select which part of the column name you would like to keep

In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i in range(3) for c in 'abc'})

In [217]: df
Out[217]:
   a_0  b_0  c_0  a_1  b_1  c_1  a_2  b_2  c_2
0    0    1    0    2    2    4    0    0    3
1    0    0    3    1    4    2    4    3    2
2    2    0    1    0    0    2    2    2    1

In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]

In [224]: df
Out[224]:
0  a  b  c  a  b  c  a  b  c
0  1  1  0  0  0  2  1  1  2
1  1  0  1  0  1  2  0  4  1
2  1  3  1  3  4  2  0  1  1

Idea to use df.columns.str came from this answer

like image 27
aydow Avatar answered Sep 29 '22 00:09

aydow