Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe column name: remove special character

Some joker made a Lotus database/applet thingy for tracking engineering issues in our company. The joke is that the key piece of information was named with a special character... a number sign (hash tag, pound sign, \u0023).

abbreviated sample:

KA#         Issue Date      Current Position
27144       1/9/2014        Accounting
27194       12/20/2012      Engineering
32474       4/21/2008       Engineering
32623-HOLD  4/25/2016       Engineering
32745       11/13/2012      SEPE
32812       10/30/2013      Engineering
32817       12/7/2012       Purchasing
32839       1/8/2013        SEPE

I output this table (4K rows, 15 columns) to a csv file and process in python3 as a pandas dataframe.

I generate various outputs. If I use something like:

df.iloc[:,[0,3,1,8,9,10]]

I get appropriate output and the key column shows up as "KA#". (When I say "key column", I mean "most important"... NOT "index". I keep a serial index)

Unfortunately, people sometimes mess with the column order in Lotus between my exports to csv so I can not guarantee that "KA#" will be any particular column number. I would like to use column names:

df.loc[:,["KA#","Issue Date","Current Position"]]

But the "KA#" column is filled with NaN's.

Thanks for any help you can offer.

Finally, if I try to rename "KA#" to simply "KA":

df['KA#'].name = 'KA'

throws a KeyError and

df = df.rename(columns={"KA#": "ka"})

is completely ignored. The column shows up as "KA#".

Can anyone think of a way to get rid of or handle that symbol? I'd even settle for a regex at this point.

like image 568
Paul Podbielski Avatar asked Jun 21 '16 19:06

Paul Podbielski


People also ask

How do I remove special characters from a DataFrame column in Python?

Add df = df. astype(float) after the replace and you've got it. I'd skip inplace and just do df = df. replace('\*', '', regex=True).

How do I remove a character from a column in Pandas?

To remove characters from columns in Pandas DataFrame, use the replace(~) method. Here, [ab] is regex and matches any character that is a or b .

How do I rename a column label in Pandas?

We can use pandas DataFrame rename() function to rename columns and indexes. It supports the following parameters. mapper: dictionary or a function to apply on the columns and indexes.


1 Answers

use str.replace:
df.columns=df.columns.str.replace('#','')

You can check this in the documentation.

like image 99
shivsn Avatar answered Sep 27 '22 23:09

shivsn