Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Exclude unwanted tag on Beautifulsoup Python

<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 </span>

How to print "I Like your face" instead of "I Like to punch your face"

I tried this

lala = soup.find_all('span')
for p in lala:
 if not p.find(class_='unwanted'):
    print p.text

but it give "TypeError: find() takes no keyword arguments"

like image 478
masbro Avatar asked Nov 23 '16 09:11

masbro


People also ask

What function in BeautifulSoup will remove a tag from the HTML tree and destroy it?

Tag. decompose() removes a tag from the tree of a given HTML document, then completely destroys it and its contents.

Can BeautifulSoup handle broken HTML?

BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.


1 Answers

You can use extract() to remove unwanted tag before you get text.

But it keeps all '\n' and spaces so you will need some work to remove them.

data = '''<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 <span>'''

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

external_span = soup.find('span')

print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())

unwanted = external_span.find('span')
unwanted.extract()

print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())

Result

1 HTML: <span>
  I Like
  <span class="unwanted"> to punch </span>
   your face
 <span></span></span>
1 TEXT: I Like
   to punch 
   your face
2 HTML: <span>
  I Like

   your face
 <span></span></span>
2 TEXT: I Like

   your face

You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML).

data = '''<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 <span>'''

from bs4 import BeautifulSoup as BS
import bs4

soup = BS(data, 'html.parser')

external_span = soup.find('span')

text = []
for x in external_span:
    if isinstance(x, bs4.element.NavigableString):
        text.append(x.strip())
print(" ".join(text))

Result

I Like your face
like image 73
furas Avatar answered Sep 29 '22 14:09

furas