<pre class="prettyprint"><code> I Like to punch your face </code></pre> How to print "I Like your face" instead of "I Like to punch your face" I tried this <pre class="prettyprint"><code>lala = soup.find_all('span') for p in lala: if not p.find(class_='unwanted'): print p.text </code></pre> but it give "TypeError: find() takes no keyword arguments"

You can use <code>extract()</code> to remove unwanted tag before you get text. But it keeps all <code>'\n'</code> and <code>spaces</code> so you will need some work to remove them. <pre class="prettyprint"><code>data = ''' I Like to punch your face ''' from bs4 import BeautifulSoup as BS soup = BS(data, 'html.parser') external_span = soup.find('span') print("1 HTML:", external_span) print("1 TEXT:", external_span.text.strip()) unwanted = external_span.find('span') unwanted.extract() print("2 HTML:", external_span) print("2 TEXT:", external_span.text.strip()) </code></pre> Result <pre class="prettyprint"><code>1 HTML: I Like to punch your face 1 TEXT: I Like to punch your face 2 HTML: I Like your face 2 TEXT: I Like your face </code></pre> <hr> You can skip every <code>Tag</code> object inside external span and keep only <code>NavigableString</code> objects (it is plain text in HTML). <pre class="prettyprint"><code>data = ''' I Like to punch your face ''' from bs4 import BeautifulSoup as BS import bs4 soup = BS(data, 'html.parser') external_span = soup.find('span') text = [] for x in external_span: if isinstance(x, bs4.element.NavigableString): text.append(x.strip()) print(" ".join(text)) </code></pre> Result <pre class="prettyprint"><code>I Like your face </code></pre>

Exclude unwanted tag on Beautifulsoup Python

Tags:

python

html

beautifulsoup

web-scraping

<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 </span>

How to print "I Like your face" instead of "I Like to punch your face"

I tried this

lala = soup.find_all('span')
for p in lala:
 if not p.find(class_='unwanted'):
    print p.text

but it give "TypeError: find() takes no keyword arguments"

478

asked Nov 23 '16 09:11

masbro

1 Answers

You can use extract() to remove unwanted tag before you get text.

But it keeps all '\n' and spaces so you will need some work to remove them.

data = '''<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 <span>'''

from bs4 import BeautifulSoup as BS

soup = BS(data, 'html.parser')

external_span = soup.find('span')

print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())

unwanted = external_span.find('span')
unwanted.extract()

print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())

Result

1 HTML: <span>
  I Like
  <span class="unwanted"> to punch </span>
   your face
 <span></span></span>
1 TEXT: I Like
   to punch 
   your face
2 HTML: <span>
  I Like

   your face
 <span></span></span>
2 TEXT: I Like

   your face

You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML).

data = '''<span>
  I Like
  <span class='unwanted'> to punch </span>
   your face
 <span>'''

from bs4 import BeautifulSoup as BS
import bs4

soup = BS(data, 'html.parser')

external_span = soup.find('span')

text = []
for x in external_span:
    if isinstance(x, bs4.element.NavigableString):
        text.append(x.strip())
print(" ".join(text))

Result

I Like your face

answered Sep 29 '22 14:09

furas

Related questions
                            
                                how to keep track of asynchronous results returned from a multiprocessing pool
                            
                                How to test session in flask resource
                            
                                Create a dictionary from groupby object,Python
                            
                                How to download cross-platform wheels via pip?
                            
                                List all methods in COMobject
                            
                                How to do Django JSON Web Token Authentication without forcing the user to re-type their password?
                            
                                httplib.BadStatusLine: ''
                            
                                How do I escape forward slashes in python, so that open() sees my file as a filename to write, instead of a filepath to read?
                            
                                Is there a way to change the filemode for a logger object that is not configured using basicConfig?
                            
                                Python "bad interpreter" ERROR
                            
                                new column with coordinates using geopy pandas
                            
                                iPython - set up magic commands in configuration file
                            
                                How to change the number of axis ticks in seaborn plots
                            
                                numpy.core.multiarray failed to import
                            
                                Time Series Analysis - unevenly spaced measures - pandas + statsmodels
                            
                                When bulding a CNN, I am getting complaints from Keras that do not make sense to me.
                            
                                pandas read_csv column dtype is set to decimal but converts to string
                            
                                Split nested array values from Pandas Dataframe cell over multiple rows
                            
                                Pandas: get multiindex level as series
                            
                                Using tf.unpack() when first dimension of Variable is None

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With