I would like to wrap some words that are not already links with anchor links in BeautifulSoup. I use this to achieve it:
from bs4 import BeautifulSoup
import re
text = ''' replace this string '''
soup = BeautifulSoup(text)
pattern = 'replace'
for txt in soup.findAll(text=True):
if re.search(pattern,txt,re.I) and txt.parent.name != 'a':
newtext = re.sub(r'(%s)' % pattern,
r'<a href="#\1">\1</a>',
txt)
txt.replaceWith(newtext)
print(soup)
Which unfortunately returns
<html><body><p><a href="#replace">replace</a> this string </p></body></html>
Whereas I am looking for:
<html><body><p><a href="#replace">replace</a> this string </p></body></html>
Is there a way in which I can tell BeautifulSoup not to escape the link elements?
A simple regex to replace will not do here because I will eventually not only have one pattern that I want to replace but multiple. This is why I decided to use BeautifulSoup to exclude everything that already is a link.
You need to create new tag using new_tag
use insert_after
to insert part of your text
after your newly created a
tag.
for txt in soup.find_all(text=True):
if re.search(pattern, txt, re.I) and txt.parent.name != 'a':
newtag = soup.new_tag('a')
newtag.attrs['href'] = "#{}".format(pattern)
newtag.string = pattern
txt.replace_with(newtag)
newtag.insert_after(txt.replace(pattern, ""))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With