Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas trim leading & trailing white space in a dataframe

Tags:

python

pandas

develop a function that Trims leading & trailing white space.

here is a simple sample, but real file contains far more complex rows and columns.

df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\
[" random",43,4],[" any txt is possible "," 2 1",22],\
["",23,99],[" help ",23,np.nan]],columns=['A','B','C'])

the result should eliminate all leading & trailing white space, but retain the space inbetween the text.

df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\
["random",43,4],["any txt is possible","2 1",22],\
["",23,99],["help",23,np.nan]],columns=['A','B','C'])

Mind that the function needs to cover all possible situations. thank you

like image 861
S.Gu Avatar asked Mar 29 '18 08:03

S.Gu


People also ask

How do you trim values in a data frame?

Series. str. strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring.

How do you strip a series in pandas?

The str. strip() function is used to remove leading and trailing characters. Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides.

How do I trim a dataset in pandas?

truncate() function is used to truncate a Series or DataFrame before and after some index value. This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.


1 Answers

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
                     A    B     C
0                  A b    2   3.0
1                  NaN    2   3.0
2               random   43   4.0
3  any txt is possible  2 1  22.0
4                        23  99.0
5                 help   23   NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
                     A    B     C
0                  A b  NaN   3.0
1                  NaN  NaN   3.0
2               random  NaN   4.0
3  any txt is possible  2 1  22.0
4                       NaN  99.0
5                 help  NaN   NaN
like image 135
jezrael Avatar answered Nov 14 '22 23:11

jezrael