Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Append markup string to a tag in BeautifulSoup

Is it possible to set markup as tag content (akin to setting innerHtml in JavaScript)?

For the sake of example, let's say I want to add 10 <a> elements to a <div>, but have them separated with a comma:

soup = BeautifulSoup(<<some document here>>)

a_tags = ["<a>1</a>", "<a>2</a>", ...] # list of strings
div = soup.new_tag("div")
a_str = ",".join(a_tags)

Using div.append(a_str) escapes < and > into &lt; and &gt;, so I end up with

<div> &lt;a1&gt; 1 &lt;/a&gt; ... </div>

BeautifulSoup(a_str) wraps this in <html>, and I see getting the tree out of it as an inelegant hack.

What to do?

like image 984
Konrad Avatar asked Nov 18 '14 01:11

Konrad


1 Answers

You need to create a BeautifulSoup object out of your HTML string containing links:

from bs4 import BeautifulSoup

soup = BeautifulSoup()
div = soup.new_tag('div')

a_tags = ["<a>1</a>", "<a>2</a>", "<a>3</a>", "<a>4</a>", "<a>5</a>"]
a_str = ",".join(a_tags)

div.append(BeautifulSoup(a_str, 'html.parser'))

soup.append(div)
print soup

Prints:

<div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>

Alternative solution:

For each link create a Tag and append it to div. Also, append a comma after each link except last:

from bs4 import BeautifulSoup

soup = BeautifulSoup()
div = soup.new_tag('div')

for x in xrange(1, 6):
    link = soup.new_tag('a')
    link.string = str(x)
    div.append(link)

    # do not append comma after the last element
    if x != 6:
        div.append(",")

soup.append(div)

print soup

Prints:

<div><a>1</a>,<a>2</a>,<a>3</a>,<a>4</a>,<a>5</a></div>
like image 112
alecxe Avatar answered Sep 22 '22 07:09

alecxe