Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

only get the h1 text no span text in python

Tags:

python

I have a code like this i have tried to get the data in h1 .here it is 'The Wire'.But i am getting all the text in the h1.

<h1 id="aiv-content-title" class="js-hide-on-play"> 
The Wire
    <span class="num-of-seasons">5 Seasons</span>
    <span class="release-year">2002</span>
</h1>

The output i am getting is The Wire5 Seasons2002

heading=elm.find('h1',id='aiv-content-title')
print heading
seasons=elm.find('span',{'class':'num-of-seasons'})

if seasons=='None':
    print '1'
elif seasons!='None':
    print seasons.text

release_year=elm.find('span',{'class':'release-year'})
print release_year.text
print 

When i have tried this code i am getting this way

The Wire5 Seasons2002 5 Seasons 2002

I am expecting something like this

The Wire 5 Seasons 2002

like image 659
Fazeela Abu Zohra Avatar asked Feb 19 '26 09:02

Fazeela Abu Zohra


1 Answers

You can do the following :

h1_element = elm.find('h1',{id:'aiv-content-title'})
num_seasons = h1_element.find('span',{'class':'num-of-seasons'}).getText().strip()
release_year = h1_element.find('span',{'class':'release-year'}).getText().strip()

while h1_element.find('span'):
   h1_element.find('span').extract() 
   # This will remove the span elements in the h1 element

print h1_element.getText().strip()
print num_seasons
print release_year
like image 194
DavidK Avatar answered Feb 20 '26 21:02

DavidK



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!