I have a series of <p>
elements inside a document I'm scraping with scrapy.
some of the are:
<p><span>bla bla bla</span></p>
or
<p><span><span>bla bla bla</span><span>second bla bla</span></span></p>
I want to extract all the text with the children (assume I already have the selector of the <p
)
(second example: to have a string bla bla bla second bla bla
)
Here are 2 options, either can have their benefits depending on the situation.
html sample
<p>Something outside the span<span> and something inside the span</span></p>
Option 01: use //text()
-> returns list
response.xpath('//p//text()').getall()
# returns
>>> ['Something outside the span', ' and something inside the span']
Option 02: use string()
-> returns string
response.xpath('string(//p)').get()
# returns
>>> 'Something outside the span and something inside the span'
you can just use //text()
to extract all text from children nodes
for example:
.//p//text()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With