I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get created. The following is the command I used,
a.join(b,how='inner', on='userId',lsuffix="_1")
If I dont use this suffix, I am getting error. But I dont want the column names to change because, that is causing a problem while running other analysis. So I want to remove this "_1" character from all the column names of the resulting dataframe. Can anybody suggest me an efficient way to remove last two characters of names of all the columns in the Pandas dataframe?
Thanks
This snippet should get the job done :
df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))
Edit : This is a better way to do it
df.rename(columns = lambda x : str(x)[:-2])
In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.
I'm sure there are a few other ways you could do this.
You could use str.rstrip
like so
In [214]: import functools as ft
In [215]: f = ft.partial(np.random.choice, *[5, 3])
In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})
In [226]: df
Out[226]:
a b c a_1 b_1 c_1
0 4 2 0 2 3 2
1 0 0 3 2 1 1
2 4 0 4 4 4 3
In [227]: df.columns = df.columns.str.rstrip('_1')
In [228]: df
Out[228]:
a b c a b c
0 4 2 0 2 3 2
1 0 0 3 2 1 1
2 4 0 4 4 4 3
However if you need something more flexible (albeit probably a bit slower), you can use str.extract
which, with the power of regexes, will allow you to select which part of the column name you would like to keep
In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i in range(3) for c in 'abc'})
In [217]: df
Out[217]:
a_0 b_0 c_0 a_1 b_1 c_1 a_2 b_2 c_2
0 0 1 0 2 2 4 0 0 3
1 0 0 3 1 4 2 4 3 2
2 2 0 1 0 0 2 2 2 1
In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]
In [224]: df
Out[224]:
0 a b c a b c a b c
0 1 1 0 0 0 2 1 1 2
1 1 0 1 0 1 2 0 4 1
2 1 3 1 3 4 2 0 1 1
Idea to use df.columns.str
came from this answer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With