Given an example dataframe with the 2nd and 3rd columns of free text, e.g.
>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo\nbar
1  3  1  def\nhaha  love it\n
The goal is to replace the \n to  (whitespace) and strip the string in column 2 and 3 to achieve:
>>> pd.DataFrame(lol)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
How to replace newlines with spaces for specific columns through pandas dataframe?
I have tried this:
>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()
>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]
>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
But there must be a better/simpler way.
You can replace all values or selected values in a column of pandas DataFrame based on condition by using DataFrame. loc[] , np. where() and DataFrame. mask() methods.
You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword. Basically when it is not valid Python identifier. See notes down for more details.
strip() function is used to remove or strip the leading and trailing space of the column in pandas dataframe.
Use replace - first first and last strip and then replace \n:
df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it
                        You can select_dtypes to select columns of type object and use applymap on those columns.
Because there is no inplace argument for these functions, this would be a workaround to make change to the dataframe:
strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it
                        Adding to the other nice answers, this is a vectorized version of your initial idea:
columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns] 
Details:
In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  
In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With