develop a function that Trims leading & trailing white space.
here is a simple sample, but real file contains far more complex rows and columns.
df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\
[" random",43,4],[" any txt is possible "," 2 1",22],\
["",23,99],[" help ",23,np.nan]],columns=['A','B','C'])
the result should eliminate all leading & trailing white space, but retain the space inbetween the text.
df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\
["random",43,4],["any txt is possible","2 1",22],\
["",23,99],["help",23,np.nan]],columns=['A','B','C'])
Mind that the function needs to cover all possible situations. thank you
Series. str. strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from staring.
The str. strip() function is used to remove leading and trailing characters. Strip whitespaces (including newlines) or a set of specified characters from each string in the Series/Index from left and right sides.
truncate() function is used to truncate a Series or DataFrame before and after some index value. This is a useful shorthand for boolean indexing based on index values above or below certain thresholds.
I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip
:
df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
A B C
0 A b 2 3.0
1 NaN 2 3.0
2 random 43 4.0
3 any txt is possible 2 1 22.0
4 23 99.0
5 help 23 NaN
If columns have same dtypes, not get NaN
s like in your sample for numeric values in column B
:
cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
A B C
0 A b NaN 3.0
1 NaN NaN 3.0
2 random NaN 4.0
3 any txt is possible 2 1 22.0
4 NaN 99.0
5 help NaN NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With