Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get contents of div by id with BeautifulSoup

I am using python2.7.6, urllib2, and BeautifulSoup

to extract html from a website and store in a variable.

How can I show just the html contents of a div with an id by using beautifulsoup?

<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

would be

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
like image 681
user8028 Avatar asked Sep 02 '14 01:09

user8028


People also ask

How do I find the element by id in Beautiful Soup Python?

To extract elements by id in Beautiful Soup: use the find_all(~) method with argument id . use the select(css_selector) method.


2 Answers

Join the elements of div tag's .contents:

from bs4 import BeautifulSoup

data = """
<div id='theDiv'>
    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>
</div>
"""

soup = BeautifulSoup(data)
div = soup.find('div', id='theDiv')
print ''.join(map(str, div.contents))

Prints:

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
like image 67
alecxe Avatar answered Sep 27 '22 19:09

alecxe


Since version 4.0.1 there's a function decode_contents():

>>> soup = BeautifulSoup("""
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
""")

>>> print(soup.div.decode_contents())

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

More details in a solution to this question: https://stackoverflow.com/a/18602241/237105

like image 41
Antony Hatchkins Avatar answered Sep 27 '22 21:09

Antony Hatchkins