Having imported a series of csv and xls files, using the header in the file. I've noticed that these headers aren't clean, so that when i call them i get an error returned saying there's no such attribute. What i'm looking to do is something similar to this;
Use the built in function to create list of imported headers
currentheaders = list(df.columns.values)
Clean that list (this is the part i'm stuck on)
cleanedheaders = str.strip or regex equivalent
Apply that list as new headers
df.columns = ['cleanedheaders']
Strip doesn't work on lists and regex wants to be a data frame, is there an equivalent function for a list?
A compact and quick way would be
df.columns = [c.strip() for c in df.columns.values.tolist()]
If you wanted to use DataFrame.rename()
then you would in fact need to call it like that:
df.rename(columns={c: c.strip() for c in df.columns.values.tolist()}, inplace=True)
or you can of course use the also compact and quick (borrowed by MaxU):
df.columns = df.columns.str.strip()
Keep in mind none of the above solutions will work if ANY of the column names are in fact not a string.
If any of the column names is not a string, then ideally you would turn them all to strings, this would work:
df.columns = [str(i) for i in df.columns.values.tolist()]
or if you didn't want to turn column names to strings - for some good reason I hope - then you would have to do the following:
df.rename(columns={c: c.strip() for c in df.columns.values.tolist()
if c not in [<list of columns not strings>]}, inplace=True)
Try this:
columns = {c: c.strip() for c in df.columns} # or any cleaning
df.rename(columns, inplace=True)
This solution will strip all elements in a list:
list = [' test1', ' test2 ']
print [l.strip() for l in list]
Result:
['test1', 'test2']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With