Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use BeautifulSoup to wrap body contents with div container

How can I wrap <div data-role="content"></div> around the contents of html body with beautiful soup?

I tried to start with the following but haven't been able to make any progress:

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(u"%s" % response)
    wrapper = soup.new_tag('div', **{"data-role":"content"})
    soup.body.append(wrapper)
    for content in soup.body.contents:
        wrapper.append(content)

I also tried using body.children but no luck.

This appends the wrapper to the body, but doesn't wrap the body contents like I need

-- edit --

I've gotten to here, but now I end up with duplicate body elements like this <body><div data-role="content"><body>content here</body></div></body>

    from bs4 import BeautifulSoup
    soup = BeautifulSoup(u"%s" % response)
    wrapper = soup.new_tag('div', **{"data-role":"content"})
    new_body = soup.new_tag('body')
    contents = soup.body.replace_with(new_body)
    wrapper.append(contents)
    new_body.append(wrapper)
like image 967
user319862 Avatar asked Dec 26 '13 19:12

user319862


2 Answers

How about this?

from bs4 import BeautifulSoup
soup = BeautifulSoup(unicode(response))
wrapper = soup.new_tag('div', **{"data-role":"content"})
body_children = list(soup.body.children)
soup.body.clear()
soup.body.append(wrapper)
for child in body_children:
    wrapper.append(child)
like image 127
hiroshi Avatar answered Oct 15 '22 21:10

hiroshi


I recently hit upon this same situation, and I'm not content with any of the other answers here. Iterating through a massive list and rebuilding the DOM doesn't seem acceptable to me performance-wise, and the other solution wraps the body, not the body's contents. Here's my solution:

soup.body.wrap(soup.new_tag("div", **{"data-role": "content"})).wrap(soup.new_tag("body"))
soup.body.body.unwrap()

Very simply, this approach just wraps the body twice, first with the new tag, then another body. Then I use BeautifulSoup's unwrap method to delete the original body while maintaining the contents.

like image 1
Nathan Hazzard Avatar answered Oct 15 '22 20:10

Nathan Hazzard