<p>The page that I'm scraping contains these HTML codes. How do I remove the comment tag <code></code> along with its content with bs4?</p> <pre class="prettyprint lang-html prettyprint-override"><code><div class="foo"> cat dog sheep goat  </div> </code></pre>

<p>You can use <code>extract()</code> (solution is based on this answer):</p> <blockquote> <p>PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted.</p> </blockquote> <pre class="prettyprint"><code>from bs4 import BeautifulSoup, Comment data = """<div class="foo"> cat dog sheep goat  </div>""" soup = BeautifulSoup(data) div = soup.find('div', class_='foo') for element in div(text=lambda text: isinstance(text, Comment)): element.extract() print soup.prettify() </code></pre> <p>As a result you get your <code>div</code> without comments:</p> <pre class="prettyprint"><code><div class="foo"> cat dog sheep goat </div> </code></pre>

Beautiful Soup 4: Remove comment tag and its content

<div class="foo">
cat dog sheep goat
<!-- 
<p>NewPP limit report
Preprocessor node count: 478/300000
Post‐expand include size: 4852/2097152 bytes
Template argument size: 870/2097152 bytes
Expensive parser function count: 2/100
ExtLoops count: 6/100
</p>
-->
</div>

939

asked Apr 25 '14 17:04

Flint

1 Answers

You can use extract() (solution is based on this answer):

PageElement.extract() removes a tag or string from the tree. It returns the tag or string that was extracted.

from bs4 import BeautifulSoup, Comment

data = """<div class="foo">
cat dog sheep goat
<!--
<p>test</p>
-->
</div>"""

soup = BeautifulSoup(data)

div = soup.find('div', class_='foo')
for element in div(text=lambda text: isinstance(text, Comment)):
    element.extract()

print soup.prettify()

As a result you get your div without comments:

<div class="foo">
    cat dog sheep goat
</div>

180

answered Sep 23 '22 04:09

alecxe

Related questions
                            
                                SOAP suds and the dreaded schema Type Not Found error
                            
                                Making Django Readonly ForeignKey Field in Admin Render as a Link
                            
                                Convert base64 to Image in Python
                            
                                Real-time operating via Python
                            
                                Tkinter binding a function with arguments to a widget
                            
                                @csrf_exempt stopped working in Django 1.4
                            
                                How to *change* a struct_time object?
                            
                                Python os.environ["HOME"] works on idle but not in a script
                            
                                Python Turtle, draw text with on screen with larger font
                            
                                Python Flask WTForms: How can I disable a field dynamically in a view?
                            
                                live updating with matplotlib
                            
                                XML Declaration standalone="yes" lxml
                            
                                ImportError: No module named mpl_toolkits with maptlotlib 1.3.0 and py2exe
                            
                                pandas plot dataframe barplot with colors by category
                            
                                Transparency for Poly3DCollection plot in matplotlib
                            
                                How to read the last MB of a very large text file
                            
                                Python - How to save functions
                            
                                how to align text to the left?
                            
                                h5py: Correct way to slice array datasets
                            
                                Python programming functional vs. imperative code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Beautiful Soup 4: Remove comment tag and its content

Tags:

python

html

html-parsing

beautifulsoup

web-scraping

Flint

People also ask

1 Answers

alecxe

Recent Activity

Donate For Us