I did this: <pre class="prettyprint"><code>from urllib import urlopen import nltk url = http://myurl.com html = urlopen(url).read() cleanhtml = nltk.clean_html(html) </code></pre> I now have a long string in python which is full of text interrupted periodically by windows newlines <code>/r/n</code>, and I simply want to remove all of the occurrences of /r/n from the string using a regular expression. First I want to replace it with a space. As such, I did this: <pre class="prettyprint"><code>import re textspaced = re.sub("'\r\n'", r"' '", cleanhtml) </code></pre> ...it didn't work. So what am I doing wrong?

There's no need to use regular expressions, just <pre class="prettyprint"><code>htmlspaced = html.replace('\r\n', ' ') </code></pre> If you need to also match UNIX and oldMac newlines, use regular expressions: <pre class="prettyprint"><code>import re htmlspaces = re.sub(r'\r\n|\r|\n', ' ', html) </code></pre>

Just a small syntax error: <pre class="prettyprint"><code>htmlspaced = re.sub(r"\r\n", " ", html) </code></pre> should work.

python regex to replace all windows newlines with spaces

Tags:

python

regex

I did this:

from urllib import urlopen
import nltk
url = http://myurl.com
html = urlopen(url).read()
cleanhtml = nltk.clean_html(html)

I now have a long string in python which is full of text interrupted periodically by windows newlines /r/n, and I simply want to remove all of the occurrences of /r/n from the string using a regular expression. First I want to replace it with a space. As such, I did this:

import re
textspaced = re.sub("'\r\n'", r"' '", cleanhtml)

...it didn't work. So what am I doing wrong?

956

asked Jun 29 '11 16:06

magnetar

2 Answers

There's no need to use regular expressions, just

htmlspaced = html.replace('\r\n', ' ')

If you need to also match UNIX and oldMac newlines, use regular expressions:

import re
htmlspaces = re.sub(r'\r\n|\r|\n', ' ', html)

151

answered Oct 05 '22 07:10

phihag

Just a small syntax error:

htmlspaced = re.sub(r"\r\n", " ", html)

should work.

answered Oct 05 '22 08:10

Tim Pietzcker

Related questions
                            
                                flask-mail gmail: connection refused
                            
                                Cut and Paste a File or Directory in Python [duplicate]
                            
                                Install Tkinter On Amazon Linux
                            
                                No such file or directory: '/usr/local/bin/pip'
                            
                                How to create a neural network for regression?
                            
                                Spacy, Strange similarity between two sentences
                            
                                how to install virtualenv on Ubuntu 20.04 GCP instance?
                            
                                Django models avoid duplicates
                            
                                int((0.1+0.7)*10) = 7 in several languages. How to prevent this?
                            
                                How can I parse multiple (unknown) date formats in python?
                            
                                How do I print a Celsius symbol with matplotlib?
                            
                                Single line of code to check for a key in a 2D nested inner dictionary
                            
                                Python removing all negative values in array
                            
                                Django - Dictionary update sequence element #0 has length 1; 2 is required [duplicate]
                            
                                Hide the console of an .exe file created with PyInstaller
                            
                                Convert pyspark.sql.dataframe.DataFrame type Dataframe to Dictionary
                            
                                Why is imperative mood important for docstrings?
                            
                                Read two variables in a single line with Python
                            
                                Using python imaplib to "delete" an email from Gmail?
                            
                                identifying objects, why does the returned value from id(...) change?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With