Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove '\n' in text in pandas python

The following code is current code that i use to remove \n in ['text'] column:

df = pd.read_csv('file1.csv')

df['text'].replace('\s+', ' ', regex=True, inplace=True) # remove extra whitespace
df['text'].replace('\n',' ', regex=True) # remove \n in text

header = ["text", "word_length", "author"]

df_out = df.to_csv('sn_file1.csv', columns = header, sep=',', encoding='utf-8')

I've tried too from the suggestions:

df['text'].replace('\n', '')
df['text'] = df['text'].str.replace('\n', '').str.replace('\s+', ' ').str.strip()

Output: ' What a smartass! \nLike he knows anything about real estate deals too...'

The code to remove whitespace is working. But not in removing the \n. Anyone can help me on this matter? Thanks.

I've tried to solve based on the suggestion from this link too removing newlines from messy strings in pandas dataframe cells? but it's still not working.

Solved:

df['text'].replace(r'\s+|\\n', ' ', regex=True, inplace=True) 
like image 823
Lily Avatar asked Sep 10 '18 08:09

Lily


People also ask

How do you remove N from text in Python?

Use the strip() Function to Remove a Newline Character From the String in Python. The strip() function is used to remove both trailing and leading newlines from the string that it is being operated on. It also removes the whitespaces on both sides of the string.

How do you remove N values from a column in Python?

replace('\n', '') runs the str. replace() function on each element of df . df. replace() replaces entire values in the column with the given replacement.

How do I remove special characters from a DataFrame in Python?

Add df = df. astype(float) after the replace and you've got it. I'd skip inplace and just do df = df. replace('\*', '', regex=True).


1 Answers

Considering one wants to apply the changes to the column 'texts', select that column as

df['text']

Then, to achieve that, one might use pandas.DataFrame.replace.

This lets one can pass regular expressions, regex=True, which will interpret both the strings in both lists as regexs (instead of matching them directly).

Picking up on @Wiktor Stribiżew suggestion, the following will do the work

df['text'] = df['text'].replace(r'\s+|\\n', ' ', regex=True) 

This regular expression syntax reference may be of help.

like image 139
Gonçalo Peres Avatar answered Oct 16 '22 10:10

Gonçalo Peres