Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleaning headers in imported pandas dataframe

Tags:

python

pandas

Having imported a series of csv and xls files, using the header in the file. I've noticed that these headers aren't clean, so that when i call them i get an error returned saying there's no such attribute. What i'm looking to do is something similar to this;

Use the built in function to create list of imported headers

currentheaders = list(df.columns.values)

Clean that list (this is the part i'm stuck on)

cleanedheaders = str.strip or regex equivalent

Apply that list as new headers

df.columns = ['cleanedheaders']

Strip doesn't work on lists and regex wants to be a data frame, is there an equivalent function for a list?

like image 785
mapping dom Avatar asked Apr 23 '16 21:04

mapping dom


3 Answers

A compact and quick way would be

df.columns = [c.strip() for c in df.columns.values.tolist()]

If you wanted to use DataFrame.rename() then you would in fact need to call it like that:

df.rename(columns={c: c.strip() for c in df.columns.values.tolist()}, inplace=True) 

or you can of course use the also compact and quick (borrowed by MaxU):

df.columns = df.columns.str.strip()

Keep in mind none of the above solutions will work if ANY of the column names are in fact not a string.

If any of the column names is not a string, then ideally you would turn them all to strings, this would work:

df.columns = [str(i) for i in df.columns.values.tolist()]

or if you didn't want to turn column names to strings - for some good reason I hope - then you would have to do the following:

df.rename(columns={c: c.strip() for c in df.columns.values.tolist() 
                      if c not in [<list of columns not strings>]}, inplace=True)
like image 107
Thanos Avatar answered Sep 30 '22 09:09

Thanos


Try this:

columns = {c: c.strip() for c in df.columns} # or any cleaning
df.rename(columns, inplace=True)
like image 33
elyase Avatar answered Sep 30 '22 10:09

elyase


This solution will strip all elements in a list:

list = [' test1', '   test2  ']
print [l.strip() for l in list]

Result:

['test1', 'test2']

like image 27
tfv Avatar answered Sep 30 '22 09:09

tfv