Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I remove a NavigableString from the tree?

I am a bit confused: all tags have a decompose() method which allows to remove the tag from the tree in place. But what if I want to remove a NavigableString? It doesn't have such method:

>>> b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
>>> b.p.contents[0]
'aaaa '
>>> type(b.p.contents[0])
<class 'bs4.element.NavigableString'>
>>> b.p.contents[0].decompose()
Traceback (most recent call last):
...
AttributeError: 'NavigableString' object has no attribute 'decompose'

There's a way I managed to somewhat remove the NavigableString from the tree: by removing it from the content list:

>>> b.p.contents.pop(0)
'aaaa '
>>> b
<p><span> bbbbb </span> ccccc</p>

The problem is that it is still present in the strings method response:

>>> list(b.strings)
['aaaa ', ' bbbbb ', ' ccccc']

Which shows that it was wrong way to do. Besides, I am using strings in my code so this hacky solution is not acceptable, alas.


So the question is: how can I remove the specific NavigableString object from the tree?

like image 916
Dany Avatar asked Sep 20 '19 10:09

Dany


People also ask

What is a NavigableString?

A NavigableString object holds the text within an HTML or an XML tag. This is a Python Unicode string with methods for searching and navigation. Sometimes we may need to navigate to other tags or text within an HTML/XML document based on the current text.

How do you convert a tag to a string in Python?

To convert a Tag object to a string in Beautiful Soup, simply use str(Tag) .

Is Beautiful Soup a string?

Beautiful Soup (bs4) is a Python web scraping library for pulling the data from HTML and XML files.

Which of the following objects of Beautiful Soup is not editable?

string” with tag. You can replace the string with another string but you can't edit the existing string.


1 Answers

Use extract() instead of decompose()

extract() removes a tag or string from the tree.

decompose() removes a tag from the tree.

b = BeautifulSoup('<p>aaaa <span> bbbbb </span> ccccc</p>', 'html.parser')
b.p.contents[0].extract()
print(b)

To Know more about it please check following link where you will find more details. BeautifulSoup

like image 106
KunduK Avatar answered Sep 22 '22 13:09

KunduK