Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace string in pandas df column name

Tags:

python

regex

I have a dataframe in pandas, with columns named "string_string", I'm trying to rename them by removing the "_" and the following string. For example, I want to change "12527_AC9E5" to "12527". I've tried to use various replace options, and I can replace a specific part of the string (e.g., I can replace all the "_"), but when I introduce wildcards I do not achieve the desired result.

Below are some of the things I thought would work, but don't. If I remove the wild cards they work (i.e, they replace the _).

df = df.rename(columns=lambda x: x.sub('_.+', ''))

df.columns = df.columns.str.replace('_.+','')

Any help appreciated

like image 677
abissett Avatar asked Nov 05 '15 11:11

abissett


1 Answers

Just split on '_' and take the first element. You can take advantage of dictionary comprehension:

df = df.rename(columns={col: col.split('_')[0] for col in df.columns})
like image 143
DeepSpace Avatar answered Oct 31 '22 20:10

DeepSpace