removing newlines from messy strings in pandas dataframe cells?

Question

I've used multiple ways of splitting and stripping the strings in my pandas dataframe to remove all the ' 'characters, but for some reason it simply doesn't want to delete the characters that are attached to other words, even though I split them. I have a pandas dataframe with a column that captures text from web pages using Beautifulsoup. The text has been cleaned a bit already by beautifulsoup, but it failed in removing the newlines attached to other characters. My strings look a bit like this:

"hands-on development of games. We will study a variety of software technologies relevant to games including programming languages, scripting languages, operating systems, file systems, networks, simulation engines, and multi-media design systems. We will also study some of the underlying scientific concepts from computer science and related fields including"

Is there an easy python way to remove these " " characters?

Thanks in advance!

jezrael · Accepted Answer

EDIT: the correct answer to this is:

df = df.replace(r'
',' ', regex=True)

I think you need replace:

df = df.replace('
','', regex=True)

Or:

df = df.replace('
',' ', regex=True)

Or:

df = df.replace(r'\n',' ', regex=True)

Sample:

text = '''hands-on
dev nologies
relevant scripting
lang ''' df = pd.DataFrame({'A':[text]}) print (df)                                                    A 0  hands-on
dev nologies
relevant scripting
la...  df = df.replace('
',' ', regex=True) print (df)                                                 A 0  hands-on dev nologies relevant scripting lang

LinuxUser · Answer

df.replace(to_replace=[r"\t|\n|\r", "	|
|
"], value=["",""], regex=True, inplace=True)

worked for me.

Source:

https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

removing newlines from messy strings in pandas dataframe cells?

Tags:

python

string

split

pandas

Calvin

2 Answers

jezrael

LinuxUser

Recent Activity

Donate For Us

removing newlines from messy strings in pandas dataframe cells?

Tags:

python

string

split

pandas

Calvin

2 Answers

jezrael

LinuxUser

Related questions

Recent Activity

Donate For Us