I wanted to try some basic web-scraping but ran into a problem since I am used to simple td-tags, in this case I had a webpage which had the following pre-tag and all the text inside of it which means it is a bit trickier to scrape it.
<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111
11111112
11111113
11111114
11111115
</pre>
Any suggestions on how to scrape each row?
Thanks
If that is exactly what you want to parse, you can use the splitlines()
function easily to get a list of rows, or you can tweak the split()
function like this.
from bs4 import BeautifulSoup
content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111
11111112
11111113
11111114
11111115
</pre>""" # This is your content
soup = BeautifulSoup(content, "html.parser")
stuff = soup.find('pre').text
lines = stuff.split("\n") # or replace this by stuff.splitlines()
# print(lines) gives ["11111111", "11111112", "11111113", "11111114", "11111115"]
for line in lines:
print(line)
# prints each row separately.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With