Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I strip the whitespace from Pandas DataFrame headers?

I am parsing data from an Excel file that has extra white space in some of the column headings.

When I check the columns of the resulting dataframe, with df.columns, I see:

Index(['Year', 'Month ', 'Value'])                      ^ #                    Note the unwanted trailing space on 'Month ' 

Consequently, I can't do:

df["Month"]

Because it will tell me the column is not found, as I asked for "Month", not "Month ".

My question, then, is how can I strip out the unwanted white space from the column headings?

like image 979
Spike Williams Avatar asked Feb 06 '14 15:02

Spike Williams


People also ask

How do I strip whitespace in pandas DataFrame?

Stripping the leading and trailing spaces of column in pandas data frames can be achieved by using str. strip() function.

How do I remove spaces from a header in Python?

To strip whitespaces from column names, you can use str. strip, str. lstrip and str. rstrip.

How do I strip a whole data frame?

You can use DataFrame. select_dtypes to select string columns and then apply function str. strip .

How do you remove column names in pandas?

Remove Suffix from column names in Pandas You can use the string rstrip() function or the string replace() function to remove suffix from column names.


2 Answers

You can give functions to the rename method. The str.strip() method should do what you want:

In [5]: df Out[5]:     Year  Month   Value 0     1       2      3  [1 rows x 3 columns]  In [6]: df.rename(columns=lambda x: x.strip()) Out[6]:     Year  Month  Value 0     1      2      3  [1 rows x 3 columns] 

Note: that this returns a DataFrame object and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes, either use this in a method chain or re-assign the df variabe:

df = df.rename(columns=lambda x: x.strip()) 
like image 162
TomAugspurger Avatar answered Oct 13 '22 00:10

TomAugspurger


Since version 0.16.1 you can just call .str.strip on the columns:

df.columns = df.columns.str.strip() 

Here is a small example:

In [5]: df = pd.DataFrame(columns=['Year', 'Month ', 'Value']) print(df.columns.tolist()) df.columns = df.columns.str.strip() df.columns.tolist()  ['Year', 'Month ', 'Value'] Out[5]: ['Year', 'Month', 'Value'] 

Timings

In[26]: df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', '  sa', ' asdas ']) df Out[26]:  Empty DataFrame Columns: [ year,  month ,  day,  asdas ,  asdas, as ,   sa,  asdas ]   %timeit df.rename(columns=lambda x: x.strip()) %timeit df.columns.str.strip() 1000 loops, best of 3: 293 µs per loop 10000 loops, best of 3: 143 µs per loop 

So str.strip is ~2X faster, I expect this to scale better for larger dfs

like image 35
EdChum Avatar answered Oct 13 '22 01:10

EdChum