I currently am trying to get the code from this website: http://netherkingdom.netai.net/pycake.html Then I have a python script parse out all code in html div tags, and finally write the text from between the div tags to a file. The problem is it adds a bunch of \r and \n to the file. How can I either avoid this or remove the \r and \n. Here is my code:
import urllib.request
from html.parser import HTMLParser
import re
page = urllib.request.urlopen('http://netherkingdom.netai.net/pycake.html')
t = page.read()
class MyHTMLParser(HTMLParser):
def handle_data(self, data):
print(data)
f = open('/Users/austinhitt/Desktop/Test.py', 'r')
t = f.read()
f = open('/Users/austinhitt/Desktop/Test.py', 'w')
f.write(t + '\n' + data)
f.close()
parser = MyHTMLParser()
t = t.decode()
parser.feed(t)
And here is the resulting file it makes:
b'
import time as t\r\n
from os import path\r\n
import os\r\n
\r\n
\r\n
\r\n
\r\n
\r\n'
Preferably I would also like to have the beginning b' and last ' removed. I am using Python 3.5.1 on a Mac.
Use the str. rstrip() method to remove \r\n from a string in Python, e.g. result = my_str. rstrip() .
Use the String. replace() method to remove all line breaks from a string, e.g. str. replace(/[\r\n]/gm, ''); . The replace() method will remove all line breaks from the string by replacing them with an empty string.
Use the strip() Function to Remove a Newline Character From the String in Python. The strip() function is used to remove both trailing and leading newlines from the string that it is being operated on. It also removes the whitespaces on both sides of the string.
A simple solution is to strip trailing whitespace:
with open('gash.txt', 'r') as var:
for line in var:
line = line.rstrip()
print(line)
The advantage of rstrip()
over using a [:-2]
slice is that this is safe for UNIX style files as well.
However, if you only want to get rid of \r
and they might not be at the end-of-line, then str.replace()
is your friend:
line = line.replace('\r', '')
If you have a byte object (that's the leading b'
) the you can convert it to a native Python 3 string using:
line = line.decode()
to remove carriage return:
line = line.replace('\r', '')
to remove tab
line = line.replace('\t', '')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With