I am a bit confused: all tags have a <code>decompose()</code> method which allows to remove the tag from the tree in place. But what if I want to remove a <code>NavigableString</code>? It doesn't have such method: <pre class="prettyprint lang-py prettyprint-override"><code>>>> b = BeautifulSoup('aaaa bbbbb ccccc', 'html.parser') >>> b.p.contents[0] 'aaaa ' >>> type(b.p.contents[0]) <class 'bs4.element.NavigableString'> >>> b.p.contents[0].decompose() Traceback (most recent call last): ... AttributeError: 'NavigableString' object has no attribute 'decompose' </code></pre> There's a way I managed to somewhat remove the <code>NavigableString</code> from the tree: by removing it from the content list: <pre class="prettyprint lang-py prettyprint-override"><code>>>> b.p.contents.pop(0) 'aaaa ' >>> b bbbbb ccccc </code></pre> The problem is that it is still present in the <code>strings</code> method response: <pre class="prettyprint lang-py prettyprint-override"><code>>>> list(b.strings) ['aaaa ', ' bbbbb ', ' ccccc'] </code></pre> Which shows that it was wrong way to do. Besides, I am using <code>strings</code> in my code so this hacky solution is not acceptable, alas. <hr> So the question is: how can I remove the specific <code>NavigableString</code> object from the tree?

Use <code>extract()</code> instead of <code>decompose()</code> <code>extract()</code> removes a tag or string from the tree. <code>decompose()</code> removes a tag from the tree. <pre class="prettyprint"><code>b = BeautifulSoup('aaaa bbbbb ccccc', 'html.parser') b.p.contents[0].extract() print(b) </code></pre> To Know more about it please check following link where you will find more details. BeautifulSoup

How can I remove a NavigableString from the tree?

Tags:

python

beautifulsoup

I am a bit confused: all tags have a decompose() method which allows to remove the tag from the tree in place. But what if I want to remove a NavigableString? It doesn't have such method:

>>> b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
>>> b.p.contents[0]
'aaaa '
>>> type(b.p.contents[0])
<class 'bs4.element.NavigableString'>
>>> b.p.contents[0].decompose()
Traceback (most recent call last):
...
AttributeError: 'NavigableString' object has no attribute 'decompose'

There's a way I managed to somewhat remove the NavigableString from the tree: by removing it from the content list:

>>> b.p.contents.pop(0)
'aaaa '
>>> b
<p><span> bbbbb </span> ccccc</p>

The problem is that it is still present in the strings method response:

>>> list(b.strings)
['aaaa ', ' bbbbb ', ' ccccc']

Which shows that it was wrong way to do. Besides, I am using strings in my code so this hacky solution is not acceptable, alas.

So the question is: how can I remove the specific NavigableString object from the tree?

916

asked Sep 20 '19 10:09

Dany

1 Answers

Use extract() instead of decompose()

extract() removes a tag or string from the tree.

decompose() removes a tag from the tree.

b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
b.p.contents[0].extract()
print(b)

To Know more about it please check following link where you will find more details. BeautifulSoup

106

answered Sep 22 '22 13:09

KunduK

Related questions
                            
                                How can I get an array of all the messages from a text channel in discord.py?
                            
                                Django annotate add interval to date
                            
                                s3.upload_fileobj gives error a bytes-like object is required
                            
                                How to get the filename of a sample from a DataLoader?
                            
                                Do I need to commit .env files into the repository?
                            
                                Camelot is reading only the first page of the pdf
                            
                                TypeError: '<' not supported between instances of 'PrefixRecord' and 'PackageRecord' while updating Conda
                            
                                How to add a package-specific index-url to requirements.txt?
                            
                                How To Fix Miscased Procfile in Heroku
                            
                                How to join a list of multiprocessing.Process() at the same time?
                            
                                How to compute the Delta E between two images using OpenCV
                            
                                "detail": "Method \"GET\" not allowed." Django Rest Framework
                            
                                pymongo.errors.OperationFailure: command insert requires authentication
                            
                                AttributeError: 'MSVCCompiler' object has no attribute 'linker_exe'
                            
                                Python generics and subclasses
                            
                                How to open an image from an url with opencv using requests from python
                            
                                Detecting current async library
                            
                                Lambda Python to Query SSM Parameter Store Value
                            
                                How to check for new files in a folder in python
                            
                                import in python 3, explain the output please

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With