I am using python2.7.6, urllib2, and BeautifulSoup
to extract html from a website and store in a variable.
How can I show just the html contents of a div
with an id by using beautifulsoup?
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
would be
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
To extract elements by id in Beautiful Soup: use the find_all(~) method with argument id . use the select(css_selector) method.
Join the elements of div tag's .contents
:
from bs4 import BeautifulSoup
data = """
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
</div>
"""
soup = BeautifulSoup(data)
div = soup.find('div', id='theDiv')
print ''.join(map(str, div.contents))
Prints:
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
Since version 4.0.1 there's a function decode_contents()
:
>>> soup = BeautifulSoup("""
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
""")
>>> print(soup.div.decode_contents())
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
More details in a solution to this question: https://stackoverflow.com/a/18602241/237105
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With