Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separating text inside a <pre> tag

I wanted to try some basic web-scraping but ran into a problem since I am used to simple td-tags, in this case I had a webpage which had the following pre-tag and all the text inside of it which means it is a bit trickier to scrape it.

<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111
11111112
11111113
11111114
11111115
</pre>

Any suggestions on how to scrape each row?

Thanks

like image 638
Blueprov Avatar asked Mar 04 '23 18:03

Blueprov


1 Answers

If that is exactly what you want to parse, you can use the splitlines() function easily to get a list of rows, or you can tweak the split() function like this.

from bs4 import BeautifulSoup

content = """
<pre style="word-wrap: break-word; white-space: pre-wrap;">
11111111 
11111112 
11111113
11111114
11111115 
</pre>""" # This is your content

soup = BeautifulSoup(content, "html.parser")
stuff = soup.find('pre').text
lines = stuff.split("\n") # or replace this by stuff.splitlines()
# print(lines) gives ["11111111", "11111112", "11111113", "11111114", "11111115"]
for line in lines:
    print(line)
# prints each row separately.
like image 94
0xInfection Avatar answered Mar 15 '23 04:03

0xInfection