from BeautifulSoup import BeautifulSoup
html = '''<div class="thisText">
Poem <a href="http://famouspoetsandpoems.com/poets/edgar_allan_poe/poems/18848">The Raven</a>Once upon a midnight dreary, while I pondered, weak and weary... </div>
<div class="thisText">
In the greenest of our valleys By good angels tenanted..., part of<a href="http://famouspoetsandpoems.com/poets/edgar_allan_poe/poems/18848">The Haunted Palace</a>
</div>'''
soup = BeautifulSoup(html)
all_poems = soup.findAll("div", {"class": "thisText"})
for poems in all_poems:
print(poems.text)
I have this sample code and i cant find how to add spaces around the removed tags so when the text inside the <a href...>
get formatted it can be readable and wont display like this:
PoemThe RavenOnce upon a midnight dreary, while I pondered, weak and weary...
In the greenest of our valleys By good angels tenanted..., part ofThe Haunted Palace
get_text()
in beautifoulsoup4
has an optional input called separator
. You can use it as follows :
soup = BeautifulSoup(html)
text = soup.get_text(separator=' ')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With