develop a function that Trims leading & trailing white space. here is a simple sample, but real file contains far more complex rows and columns. <pre class="prettyprint"><code>df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\ [" random",43,4],[" any txt is possible "," 2 1",22],\ ["",23,99],[" help ",23,np.nan]],columns=['A','B','C']) </code></pre> the result should eliminate all leading & trailing white space, but retain the space inbetween the text. <pre class="prettyprint"><code>df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\ ["random",43,4],["any txt is possible","2 1",22],\ ["",23,99],["help",23,np.nan]],columns=['A','B','C']) </code></pre> Mind that the function needs to cover all possible situations. thank you

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call <code>strip</code>: <pre class="prettyprint"><code>df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x) print (df) A B C 0 A b 2 3.0 1 NaN 2 3.0 2 random 43 4.0 3 any txt is possible 2 1 22.0 4 23 99.0 5 help 23 NaN </code></pre> If columns have same dtypes, not get <code>NaN</code>s like in your sample for numeric values in column <code>B</code>: <pre class="prettyprint"><code>cols = df.select_dtypes(['object']).columns df[cols] = df[cols].apply(lambda x: x.str.strip()) print (df) A B C 0 A b NaN 3.0 1 NaN NaN 3.0 2 random NaN 4.0 3 any txt is possible 2 1 22.0 4 NaN 99.0 5 help NaN NaN </code></pre>

Pandas trim leading & trailing white space in a dataframe

Tags:

python

pandas

develop a function that Trims leading & trailing white space.

here is a simple sample, but real file contains far more complex rows and columns.

df=pd.DataFrame([["A b ",2,3],[np.nan,2,3],\
[" random",43,4],[" any txt is possible "," 2 1",22],\
["",23,99],[" help ",23,np.nan]],columns=['A','B','C'])

the result should eliminate all leading & trailing white space, but retain the space inbetween the text.

df=pd.DataFrame([["A b",2,3],[np.nan,2,3],\
["random",43,4],["any txt is possible","2 1",22],\
["",23,99],["help",23,np.nan]],columns=['A','B','C'])

Mind that the function needs to cover all possible situations. thank you

861

asked Mar 29 '18 08:03

S.Gu

1 Answers

I think need check if values are strings, because mixed values in column - numeric with strings and for each string call strip:

df = df.applymap(lambda x: x.strip() if isinstance(x, str) else x)
print (df)
                     A    B     C
0                  A b    2   3.0
1                  NaN    2   3.0
2               random   43   4.0
3  any txt is possible  2 1  22.0
4                        23  99.0
5                 help   23   NaN

If columns have same dtypes, not get NaNs like in your sample for numeric values in column B:

cols = df.select_dtypes(['object']).columns
df[cols] = df[cols].apply(lambda x: x.str.strip())
print (df)
                     A    B     C
0                  A b  NaN   3.0
1                  NaN  NaN   3.0
2               random  NaN   4.0
3  any txt is possible  2 1  22.0
4                       NaN  99.0
5                 help  NaN   NaN

135

answered Nov 14 '22 23:11

jezrael

Related questions
                            
                                How to extract h1 tag text with beautifulsoup
                            
                                Python Pandas dataframe subtract cumulative column
                            
                                How can I sum the product of two list items using for loop in python?
                            
                                Creating empty lists with the name of the elements of another list
                            
                                Django, Python inheritance: Exclude some fields from superclass
                            
                                Why is keras only doing 10 epochs when I set it to 300?
                            
                                Print statements not working when serve_forever() is called?
                            
                                Mapping string categories to numbers using pandas and numpy
                            
                                Round float to 2 digits after dot in python
                            
                                Combine numbers from two columns to create one array
                            
                                How to add "array of strings" as a schema value for BigQuery
                            
                                Loop through dataframe one by one (pandas)
                            
                                QtDesigner changes will be lost after redesign User Interface
                            
                                Take every nth row from a file with groups and n is a given in a column
                            
                                Generate random locations within a triangular domain
                            
                                Get path from firestore.DocumentRefence
                            
                                Replace a string numpy array with a number
                            
                                Split on more than one space?
                            
                                How to group dataframe by hour using timestamp with Pandas
                            
                                Inverting floats in a list of lists [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With