I want the BeautifulSoup equivalent of this jQuery question.
I'd like to find a particular regex match in BeautifulSoup text and then replace that segment of text with a wrapped version. I can do this with plaintext wrapping:
# replace all words ending in "ug" wrapped in quotes,
# with "ug" replaced with "ook"
>>> soup = BeautifulSoup("Snug as a bug in a rug")
>>> soup
<html><body><p>Snug as a bug in a rug</p></body></html>
>>> for text in soup.findAll(text=True):
... if re.search(r'ug\b',text):
... text.replaceWith(re.sub(r'(\w*)ug\b',r'"\1ook"',text))
...
u'Snug as a bug in a rug'
>>> soup
<html><body><p>"Snook" as a "book" in a "rook"</p></body></html>
But what if I want boldface rather than quotes? e.g. desired result =
<html><body><p><b>Snook</b> as a <b>book</b> in a <b>rook</b></p></body></html>
for text in soup.findAll(text=True):
if re.search(r'ug\b',text):
text.replaceWith(BeautifulSoup(re.sub(r'(\w*)ug\b',r'<b>\1ook</b>',text),'html.parser'))
soup
Out[117]: <html><body><p><b>Snook</b> as a <b>book</b> in a <b>rook</b></p></body></html>
The idea here is that we're replacing a tag with a fully-formed parse tree. The easiest way to do that is to just call BeautifulSoup
on our regex-subbed string.
The somewhat-magical 'html.parser'
argument to the inner BeautifulSoup
call is to prevent it from adding <html><body><p>
tags, like bs4 (well, lxml really) normally does. More reading on that.
So here is one way to do it. You could use regex to create new HTML with the words surrounded by boldface, throw that into the BeautifulSoup constructor, and replace the entire parent p with the new p tag.
import bs4
import re
soup = bs4.BeautifulSoup("Snug as a bug in a rug")
print soup
for text in soup.findAll(text=True):
if re.search(r'ug\b',text):
new_html = "<p>"+re.sub(r'(\w*)ug\b', r'<b>\1ook</b>', text)+"</p>"
new_soup = bs4.BeautifulSoup(new_html)
text.parent.replace_with(new_soup.p)
print soup
Another option would be to use the soup.new_tag method, but that might require a nested for loop, which won't be as elegant. I'll see if I can write it up and post it here later.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With