Separating text inside a
tag

Question

I wanted to try some basic web-scraping but ran into a problem since I am used to simple td-tags, in this case I had a webpage which had the following pre-tag and all the text inside of it which means it is a bit trickier to scrape it.

<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111
11111112
11111113
11111114
11111115
</pre>

Any suggestions on how to scrape each row?

Thanks

0xInfection · Accepted Answer

If that is exactly what you want to parse, you can use the splitlines() function easily to get a list of rows, or you can tweak the split() function like this.

from bs4 import BeautifulSoup

content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111 
11111112 
11111113
11111114
11111115 
</pre>""" # This is your content

soup = BeautifulSoup(content, "html.parser")
stuff = soup.find('pre').text
lines = stuff.split("
") # or replace this by stuff.splitlines()
# print(lines) gives ["11111111", "11111112", "11111113", "11111114", "11111115"]
for line in lines:
    print(line)
# prints each row separately.

Separating text inside a <pre> tag

Tags:

python

beautifulsoup

screen-scraping

Blueprov

1 Answers

0xInfection

Recent Activity

Donate For Us