I'm reading a CSV file into a DataFrame. I need to strip whitespace from all the stringlike cells, leaving the other cells unchanged in Python 2.7.
Here is what I'm doing:
def remove_whitespace( x ): if isinstance( x, basestring ): return x.strip() else: return x my_data = my_data.applymap( remove_whitespace )
Is there a better or more idiomatic to Pandas way to do this?
Is there a more efficient way (perhaps by doing things column wise)?
I've tried searching for a definitive answer, but most questions on this topic seem to be how to strip whitespace from the column names themselves, or presume the cells are all strings.
Pandas provide 3 methods to handle white spaces(including New line) in any text data. As it can be seen in the name, str. lstrip() is used to remove spaces from the left side of string, str. rstrip() to remove spaces from right side of the string and str.
Series. str. strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring.
You can use DataFrame. select_dtypes to select string columns and then apply function str. strip .
To strip whitespace from columns in Pandas we can use the str. strip(~) method or the str. replace(~) method.
Stumbled onto this question while looking for a quick and minimalistic snippet I could use. Had to assemble one myself from posts above. Maybe someone will find it useful:
data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With