Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strip / trim all strings of a dataframe

People also ask

How do I strip an entire DataFrame?

str. strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring. It returns a series or index of an object.

How do you trim strings in pandas?

lstrip() is used to remove spaces from the left side of string, str. rstrip() to remove spaces from right side of the string and str. strip() removes spaces from both sides. Since these are pandas function with same name as Python's default functions, .

How do I strip multiple columns in pandas?

To strip whitespace from columns in Pandas we can use the str. strip(~) method or the str. replace(~) method.


You can use DataFrame.select_dtypes to select string columns and then apply function str.strip.

Notice: Values cannot be types like dicts or lists, because their dtypes is object.

df_obj = df.select_dtypes(['object'])
print (df_obj)
0    a  
1    c  

df[df_obj.columns] = df_obj.apply(lambda x: x.str.strip())
print (df)

   0   1
0  a  10
1  c   5

But if there are only a few columns use str.strip:

df[0] = df[0].str.strip()

Money Shot

Here's a compact version of using applymap with a straightforward lambda expression to call strip only when the value is of a string type:

df.applymap(lambda x: x.strip() if isinstance(x, str) else x)

Full Example

A more complete example:

import pandas as pd


def trim_all_columns(df):
    """
    Trim whitespace from ends of each value across all series in dataframe
    """
    trim_strings = lambda x: x.strip() if isinstance(x, str) else x
    return df.applymap(trim_strings)


# simple example of trimming whitespace from data elements
df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
df = trim_all_columns(df)
print(df)


>>>
   0   1
0  a  10
1  c   5

Working Example

Here's a working example hosted by trinket: https://trinket.io/python3/e6ab7fb4ab


You can try:

df[0] = df[0].str.strip()

or more specifically for all string columns

non_numeric_columns = list(set(df.columns)-set(df._get_numeric_data().columns))
df[non_numeric_columns] = df[non_numeric_columns].apply(lambda x : str(x).strip())

If you really want to use regex, then

>>> df.replace('(^\s+|\s+$)', '', regex=True, inplace=True)
>>> df
   0   1
0  a  10
1  c   5

But it should be faster to do it like this:

>>> df[0] = df[0].str.strip()

You can use the apply function of the Series object:

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df[0][0]
'  a  '
>>> df[0] = df[0].apply(lambda x: x.strip())
>>> df[0][0]
'a'

Note the usage of strip and not the regex which is much faster

Another option - use the apply function of the DataFrame object:

>>> df = pd.DataFrame([['  a  ', 10], ['  c  ', 5]])
>>> df.apply(lambda x: x.apply(lambda y: y.strip() if type(y) == type('') else y), axis=0)

   0   1
0  a  10
1  c   5

how about (for string columns)

df[col] = df[col].str.replace(" ","")

never fails