How can I strip the whitespace from Pandas DataFrame headers?

Tags:

I am parsing data from an Excel file that has extra white space in some of the column headings.

When I check the columns of the resulting dataframe, with df.columns, I see:

Index(['Year', 'Month ', 'Value'])                      ^ #                    Note the unwanted trailing space on 'Month '

Consequently, I can't do:

df["Month"]

Because it will tell me the column is not found, as I asked for "Month", not "Month ".

My question, then, is how can I strip out the unwanted white space from the column headings?

979

asked Feb 06 '14 15:02

Spike Williams

2 Answers

You can give functions to the rename method. The str.strip() method should do what you want:

In [5]: df Out[5]:     Year  Month   Value 0     1       2      3  [1 rows x 3 columns]  In [6]: df.rename(columns=lambda x: x.strip()) Out[6]:     Year  Month  Value 0     1      2      3  [1 rows x 3 columns]

Note: that this returns a DataFrame object and it's shown as output on screen, but the changes are not actually set on your columns. To make the changes, either use this in a method chain or re-assign the df variabe:

df = df.rename(columns=lambda x: x.strip())

162

answered Oct 13 '22 00:10

TomAugspurger

Since version 0.16.1 you can just call .str.strip on the columns:

df.columns = df.columns.str.strip()

Here is a small example:

In [5]: df = pd.DataFrame(columns=['Year', 'Month ', 'Value']) print(df.columns.tolist()) df.columns = df.columns.str.strip() df.columns.tolist()  ['Year', 'Month ', 'Value'] Out[5]: ['Year', 'Month', 'Value']

Timings

In[26]: df = pd.DataFrame(columns=[' year', ' month ', ' day', ' asdas ', ' asdas', 'as ', '  sa', ' asdas ']) df Out[26]:  Empty DataFrame Columns: [ year,  month ,  day,  asdas ,  asdas, as ,   sa,  asdas ]   %timeit df.rename(columns=lambda x: x.strip()) %timeit df.columns.str.strip() 1000 loops, best of 3: 293 µs per loop 10000 loops, best of 3: 143 µs per loop

So str.strip is ~2X faster, I expect this to scale better for larger dfs

answered Oct 13 '22 01:10

EdChum

Related questions
                            
                                Compare if two variables reference the same object in python
                            
                                Line continuation for list comprehensions or generator expressions in python
                            
                                Is it possible for a unit test to assert that a method calls sys.exit()?
                            
                                Joining pandas dataframes by column names
                            
                                Find oldest/youngest datetime object in a list
                            
                                Shift column in pandas dataframe up by one?
                            
                                How can I increment a char?
                            
                                Python: Get the first character of the first string in a list?
                            
                                Play a Sound with Python [duplicate]
                            
                                Using the same option multiple times in Python's argparse
                            
                                In Python, can I call the main() of an imported module?
                            
                                UnicodeEncodeError: 'latin-1' codec can't encode character
                            
                                Disable console messages in Flask server
                            
                                Simple 'if' or logic statement in Python [closed]
                            
                                Capture keyboardinterrupt in Python without try-except
                            
                                Securely storing environment variables in GAE with app.yaml
                            
                                Comparing two dataframes and getting the differences [duplicate]
                            
                                Replacing few values in a pandas dataframe column with another value
                            
                                OperationalError: database is locked
                            
                                Python memoising/deferred lookup property decorator

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I strip the whitespace from Pandas DataFrame headers?

Tags:

python

pandas

whitespace

Spike Williams

People also ask

2 Answers

TomAugspurger

EdChum

Recent Activity

Donate For Us