I'm trying to mark up an HTML file (literally wrapping strings in "mark" tags) using python and BeautifulSoup. The problem is basically as follows...
Say I have my original html document:
test = "<h1>oh hey</h1><div>here is some <b>SILLY</b> text</div>"
I want to do a case-insensitive search for a string in this document (ignoring HTML) and wrap it in "mark" tags. So let's say I want to find "here is some silly text" in the html (ignoring the bold tags). I'd like to take the matching html and wrap it in "mark" tags.
For example, if I want to search for "here is some silly text" in test, the desired output is:
"<h1>oh hey</h1><div><mark>here is some <b>SILLY</b> text</mark></div>"
Any ideas? If it's more appropriate to use lxml or regular expressions, I'm open to those solutions as well.
>>> soup = bs4.BeautifulSoup(test)
>>> matches = soup.find_all(lambda x: x.text.lower() == 'here is some silly text')
>>> for match in matches:
... match.wrap(soup.new_tag('mark'))
>>> soup
<html><body><h1>oh hey</h1><mark><div>here is some <b>SILLY</b> text</div></mark></body></html>
The reason I had to pass a function as the name
to find_all
that compares x.text.lower()
, instead of just using the text
argument with a function that compares x.lower()
, is that the latter will not find the content in some cases that you apparently want.
The wrap
function may not work this way in some cases. If it doesn't, you will have to instead enumerate(matches)
, and set matches[i] = match.wrap(soup.new_tag('mark'))
. (You can't use replace_with
to replace a tag with a new tag that references itself.)
Also note that if your intended use case allows any non-ASCII string to ever match 'here is some silly text'
(or if you want to broaden the code to handle non-ASCII search strings), the code above using lower()
may be incorrect. You may want to call str.casefold()
and/or locale.strxfrm(s)
and/or use locale.strcoll(s, t)
instead of using ==
, but you'll have to understand what you want and how to get it to pick the right answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With