Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Get contents of div by id with BeautifulSoup

Tags:

python

html

html-parsing

beautifulsoup

python-2.7

I am using python2.7.6, urllib2, and BeautifulSoup

to extract html from a website and store in a variable.

How can I show just the html contents of a div with an id by using beautifulsoup?

<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

would be

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

like image

681

asked Sep 02 '14 01:09

user8028

People also ask

How do I find the element by id in Beautiful Soup Python?

To extract elements by id in Beautiful Soup: use the find_all(~) method with argument id . use the select(css_selector) method.

2 Answers

Join the elements of div tag's .contents:

from bs4 import BeautifulSoup

data = """
<div id='theDiv'>
    <p>div content</p>
    <p>div stuff</p>
    <p>div thing</p>
</div>
"""

soup = BeautifulSoup(data)
div = soup.find('div', id='theDiv')
print ''.join(map(str, div.contents))

Prints:

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

like image

67

answered Sep 27 '22 19:09

alecxe

Since version 4.0.1 there's a function decode_contents():

>>> soup = BeautifulSoup("""
<div id='theDiv'>
<p>div content</p>
<p>div stuff</p>
<p>div thing</p>
""")

>>> print(soup.div.decode_contents())

<p>div content</p>
<p>div stuff</p>
<p>div thing</p>

More details in a solution to this question: https://stackoverflow.com/a/18602241/237105

like image

41

answered Sep 27 '22 21:09

Antony Hatchkins

Sign in to Comment

Related questions
                            
                                python flask request hook
                            
                                Getting "IOError: [Errno 13] Permission denied:.." when importing pandas.DataFrame
                            
                                Celery scheduled tasks problems with Timezone
                            
                                PEP8 hanging indent specification
                            
                                Removing new line '\n' from the output of python BeautifulSoup
                            
                                What are Python's type "objects" exactly?
                            
                                Pandas modify column values in place based on boolean array
                            
                                How to sort OrderedDict using a sorted list of keys?
                            
                                Python Module Error on Linux
                            
                                How should I pass a matplotlib object through a function; as Axis, Axes or Figure?
                            
                                does return stop a python script [closed]
                            
                                How are "feature_importances_" ordered in Scikit-learn's RandomForestRegressor
                            
                                Unable to find out what return code of -11 means
                            
                                OpenCV Python not opening images with imread()
                            
                                find peaks location in a spectrum numpy
                            
                                How do I run a function once the form is loaded Kivy
                            
                                How do I connect to Postgresql using SSL from SqlAchemy+pg8000?
                            
                                how generators work in python
                            
                                How do I remove rows from a numpy array based on multiple conditions?
                            
                                Unpacking deeply nested struct with given C header into dictionary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With