Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove prefix (or suffix) substring from column headers in pandas

Tags:

python

pandas

I'm trying to remove the sub string _x that is located in the end of part of my df column names.

Sample df code:

import pandas as pd  d = {'W_x': ['abcde','abcde','abcde']} df = pd.DataFrame(data=d)  df['First_x']=[0,0,0] df['Last_x']=[1,2,3] df['Slice']=['abFC=0.01#%sdadf','12fdak*4%FC=-0.035faf,dd43','FC=0.5fasff'] 

output:

     W_x  First_x Last_x                 Slice 0  abcde      0     1                   abFC=0.01 1  abcde      0     2  12fdak*4%FC=-0.035faf,dd43 2  abcde      0     3                 FC=0.5fasff 

Desired output:

       W  First  Last                       Slice 0  abcde      0     1                   abFC=0.01 1  abcde      0     2  12fdak*4%FC=-0.035faf,dd43 2  abcde      0     3                 FC=0.5fasff 
like image 608
user9185511 Avatar asked Apr 14 '19 19:04

user9185511


People also ask

How do I remove a prefix from a column in pandas?

Remove Prefix from column names in Pandas You can use the string lstrip() function or the string replace() function to remove prefix from column names.

How do I remove a suffix from a DataFrame column?

To remove suffix from column labels in Pandas DataFrame, use the str. rstrip(~) method.

How do I remove a prefix from a DataFrame in pandas?

To remove prefix from column labels in Pandas DataFrame, use the str. lstrip(~) method.


2 Answers

python < 3.9, pandas < 1.4

Use str.strip/rstrip:

# df.columns = df.columns.str.strip('_x') # Or,  df.columns = df.columns.str.rstrip('_x')  # strip suffix at the right end only.  df.columns # Index(['W', 'First', 'Last', 'Slice'], dtype='object') 

To avoid the issue highlighted in the comments:

Beware of strip() if any column name starts or ends with either _ or x beyond the suffix.

You could use str.replace,

df.columns = df.columns.str.replace(r'_x$', '')  df.columns # Index(['W', 'First', 'Last', 'Slice'], dtype='object') 

Update: python >= 3.9, pandas >= 1.4

From version 1.4, you will soon be able to use str.removeprefix/str.removesuffix.

Examples:

s = pd.Series(["str_foo", "str_bar", "no_prefix"]) s 0    str_foo 1    str_bar 2    no_prefix dtype: object  s.str.removeprefix("str_") 0    foo 1    bar 2    no_prefix dtype: object 
s = pd.Series(["foo_str", "bar_str", "no_suffix"]) s 0    foo_str 1    bar_str 2    no_suffix dtype: object  s.str.removesuffix("_str") 0    foo 1    bar 2    no_suffix dtype: object 

Note that 1.4 is not out yet, but you can play with this feature by installing a development environment of pandas.

like image 115
cs95 Avatar answered Sep 21 '22 11:09

cs95


df.columns = [col[:-2] for col in df.columns if col[-2:]=='_x' else col] 

or

df.columns = [col.replace('_x', '') for col in df.columns] 
like image 20
Quang Hoang Avatar answered Sep 21 '22 11:09

Quang Hoang