Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyQuery: Get only text of element, not text of child elements

I have the following HTML:

<h1 class="price">
 <span class="strike">$325.00</span>$295.00
</h1>

I'd like to get the $295 out. However, if I simply use PyQuery as follows:

price = pq('h1').text()

I get both prices.

Extracting only direct child text for an element in jQuery looks reasonably complicated - is there a way to do it at all in PyQuery?

Currently I'm extracting the first price separately, then using replace to remove it from the text, which is a bit fiddly.

Thanks for your help.

like image 322
Richard Avatar asked Oct 21 '22 20:10

Richard


1 Answers

I don't think there is an clean way to do that. At least I've found this solution:

>>> print doc('h1').html(doc('h1')('span').outerHtml())
<h1 class="price"><span class="strike">$325.00</span></h1>

You can use .text() instead of .outerHtml() if you don't want to keep the span tag.

Removing the first one is much more easy:

>>> print doc('h1').remove('span')
<h1 class="price">
  $295.00
</h1>
like image 155
gawel Avatar answered Oct 27 '22 00:10

gawel