Given an example dataframe with the 2nd and 3rd columns of free text, e.g. <pre class="prettyprint"><code>>>> import pandas as pd >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']] >>> pd.DataFrame(lol) 0 1 2 3 0 1 2 abc foo\nbar 1 3 1 def\nhaha love it\n </code></pre> The goal is to replace the <code>\n</code> to <code></code> (whitespace) and strip the string in column 2 and 3 to achieve: <pre class="prettyprint"><code>>>> pd.DataFrame(lol) 0 1 2 3 0 1 2 abc foo bar 1 3 1 def haha love it </code></pre> How to replace newlines with spaces for specific columns through pandas dataframe? I have tried this: <pre class="prettyprint"><code>>>> import pandas as pd >>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']] >>> replace_and_strip = lambda x: x.replace('\n', ' ').strip() >>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()] >>> pd.DataFrame(lol2) 0 1 2 3 0 1 2 abc foo bar 1 3 1 def haha love it </code></pre> But there must be a better/simpler way.

Use <code>replace</code> - first first and last strip and then replace <code>\n</code>: <pre class="prettyprint"><code>df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n', ' ', regex=True) print (df) 0 1 2 3 0 1 2 abc foo bar 1 3 1 def haha love it </code></pre>

Adding to the other nice answers, this is a vectorized version of your initial idea: <pre class="prettyprint"><code>columns = [2,3] df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') for col in columns] </code></pre> <hr> Details: <pre class="prettyprint"><code>In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') for col in columns] In [50]: df Out[50]: 0 1 2 3 0 1 2 abc def haha 1 3 1 foo bar love it </code></pre>

Replacing newlines with spaces for str columns through pandas dataframe

Given an example dataframe with the 2nd and 3rd columns of free text, e.g.

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]
>>> pd.DataFrame(lol)
   0  1          2          3
0  1  2        abc   foo\nbar
1  3  1  def\nhaha  love it\n

The goal is to replace the \n to (whitespace) and strip the string in column 2 and 3 to achieve:

>>> pd.DataFrame(lol)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

How to replace newlines with spaces for specific columns through pandas dataframe?

I have tried this:

>>> import pandas as pd
>>> lol = [[1,2,'abc','foo\nbar'], [3,1, 'def\nhaha', 'love it\n']]

>>> replace_and_strip = lambda x: x.replace('\n', ' ').strip()

>>> lol2 = [[replace_and_strip(col) if type(col) == str else col for col in list(row)] for idx, row in pd.DataFrame(lol).iterrows()]

>>> pd.DataFrame(lol2)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

But there must be a better/simpler way.

How replace column values with conditions in pandas?

You can replace all values or selected values in a column of pandas DataFrame based on condition by using DataFrame. loc[] , np. where() and DataFrame. mask() methods.

Can pandas column names have spaces?

You can refer to column names that contain spaces or operators by surrounding them in backticks. This way you can also escape names that start with a digit, or those that are a Python keyword. Basically when it is not valid Python identifier. See notes down for more details.

How do you remove leading and trailing spaces in pandas?

strip() function is used to remove or strip the leading and trailing space of the column in pandas dataframe.

Use replace - first first and last strip and then replace \n:

df = df.replace({r'\s+$': '', r'^\s+': ''}, regex=True).replace(r'\n',  ' ', regex=True)
print (df)
   0  1         2        3
0  1  2       abc  foo bar
1  3  1  def haha  love it

You can select_dtypes to select columns of type object and use applymap on those columns.

Because there is no inplace argument for these functions, this would be a workaround to make change to the dataframe:

strs = lol.select_dtypes(include=['object']).applymap(lambda x: x.replace('\n', ' ').strip())
lol[strs.columns] = strs
lol
#   0  1         2        3
#0  1  2       abc  foo bar
#1  3  1  def haha  love it

Adding to the other nice answers, this is a vectorized version of your initial idea:

columns = [2,3] 
df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                       for col in columns]

Details:

In [49]: df.iloc[:, columns] = [df.iloc[:,col].str.strip().str.replace('\n',' ') 
                                 for col in columns]  

In [50]: df
Out[50]: 
   0  1        2         3
0  1  2      abc  def haha
1  3  1  foo bar   love it

Replacing newlines with spaces for str columns through pandas dataframe

Tags:

python

string

replace

pandas

strip

alvas

People also ask

3 Answers

jezrael

zipa

Mohamed Ali JAMAOUI

Recent Activity

Donate For Us

Replacing newlines with spaces for str columns through pandas dataframe

Tags:

python

string

replace

pandas

strip

alvas

People also ask

3 Answers

jezrael

zipa

Mohamed Ali JAMAOUI

Related questions

Recent Activity

Donate For Us