str. strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring. It returns a series or index of an object.
lstrip() is used to remove spaces from the left side of string, str. rstrip() to remove spaces from right side of the string and str. strip() removes spaces from both sides. Since these are pandas function with same name as Python's default functions, .
To strip whitespace from columns in Pandas we can use the str. strip(~) method or the str. replace(~) method.
You can use DataFrame.select_dtypes
to select string
columns and then apply
function str.strip
.
Notice: Values cannot be types
like dicts
or lists
, because their dtypes
is object
.
df_obj = df.select_dtypes(['object'])
print (df_obj)
0 a
1 c
df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)
0 1
0 a 10
1 c 5
But if there are only a few columns use str.strip
:
df[0] = df[0].str.strip()
Here's a compact version of using applymap
with a straightforward lambda expression to call strip
only when the value is of a string type:
df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
A more complete example:
import pandas as pd
def trim_all_columns(df):
"""
Trim whitespace from ends of each value across all series in dataframe
"""
trim_strings = lambda x: x.strip() if isinstance(x, str) else x
return df.applymap(trim_strings)
# simple example of trimming whitespace from data elements
df = pd.DataFrame([[' a ', 10], [' c ', 5]])
df = trim_all_columns(df)
print(df)
>>>
0 1
0 a 10
1 c 5
Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab
You can try:
df[0] = df[0].str.strip()
or more specifically for all string columns
non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())
If you really want to use regex, then
>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
0 1
0 a 10
1 c 5
But it should be faster to do it like this:
>>> df[0] = df[0].str.strip()
You can use the apply
function of the Series
object:
>>> df = pd.DataFrame([[' a ', 10], [' c ', 5]])
>>> df[0][0]
' a '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'
Note the usage of
strip
and not theregex
which is much faster
Another option - use the apply
function of the DataFrame object:
>>> df = pd.DataFrame([[' a ', 10], [' c ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)
0 1
0 a 10
1 c 5
how about (for string columns)
df[col] = df[col].str.replace(" ","")
never fails
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With