BeautifulSoup replaceWith() method adding escaped html, want it unescaped

Tags:

I have a python method (thank to this snippet) that takes some html and wraps <a> tags around ONLY unformatted links, using BeautifulSoup and Django's urlize:

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(urlizedText)

    print(soup)

    return str(soup)

Sample input text (as output by the first print statement) is this:

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: http://google.ca

The resulting return text (as output by the second print statement) is this:

this is a formatted link <a href="http://google.ca">http://google.ca</a>, this one is unformatted and should become formatted: &lt;a href="http://google.ca"&gt;http://google.ca&lt;/a&gt;

As you can see, it is formatting the link, but it's doing it with escaped html, so when I print it in a template {{ my.html|safe }} it doesn't render as html.

So how can I get these tags that are added with urlize to be unescaped, and render properly as html? I suspect this has something do do with me using it as a method instead of a template filter? I can't actually find the docs on this method, it doesn't appear in django.utils.html.

Edit: It appears the escaping actually happen in this line: textNode.replaceWith(urlizedText).

348

asked Oct 04 '15 18:10

43Tesseracts

1 Answers

You can turn your urlizedText string in to a new BeautifulSoup object and it will be treated as a tag in it's own right rather than text within one (which is escaped as you'd expect)

from django.utils.html import urlize
from bs4 import BeautifulSoup

def html_urlize(self, text):
    soup = BeautifulSoup(text, "html.parser")

    print(soup)

    textNodes = soup.findAll(text=True)
    for textNode in textNodes:
        if textNode.parent and getattr(textNode.parent, 'name') == 'a':
            continue  # skip already formatted links
        urlizedText = urlize(textNode)
        textNode.replaceWith(BeautifulSoup(urlizedText, "html.parser"))

    print(soup)

    return str(soup)

174

answered Oct 01 '22 06:10

Oli

Related questions
                            
                                Multiprocessing Pool in Python - Only single CPU is utilized
                            
                                ibpy: extract API responses for multiple contracts
                            
                                Error when indexing with 2 dimensions in NumPy
                            
                                how to give some unique id to each anonymous user in django
                            
                                Union with tuples Python
                            
                                Why does np.percentile return NaN for high percentiles?
                            
                                Python set interpetation of 1 and True
                            
                                Factorial of a matrix elementwise with Numpy
                            
                                How do I run twisted from the console?
                            
                                How to get an attribute of an Element that is namespaced
                            
                                why is gevent-websocket synchronous?
                            
                                Remove Outliers from dataset
                            
                                PEP 3103: Difference between switch case and if statement code blocks
                            
                                Python Telegram Bot - Send Image
                            
                                How to replace all occurences except the first one?
                            
                                Issue with scipy install on windows
                            
                                Python and BeautifulSoup Opening pages
                            
                                List of language codes (ISO639-1) in Python?
                            
                                Parse yaml into a list in python
                            
                                Python regex to extract a portion of string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

BeautifulSoup replaceWith() method adding escaped html, want it unescaped

Tags:

python

beautifulsoup

django

43Tesseracts

People also ask

1 Answers

Oli

Recent Activity

Donate For Us