For the following code:
<a class="title" href="the link">
Low price
<strong>computer</strong>
you should not miss
</a>
I used this xpath code to scrapy:
response.xpath('.//a[@class="title"]//text()[normalize-space()]').extract()
I got the following result:
u'\n \n Low price ', u'computer', u' you should not miss'
Why two \n
and many empty spaces before low price
was not removed by normalize-space()
for this example?
Another question: how to combine the 3 parts as one scraped item as u'Low price computer you should not miss'
?
If an element has spaces in its text or in the value of any attribute, then to create an xpath for such an element we have to use the normalize-space function. It removes all the trailing and leading spaces from the string. It also removes every new tab or lines existing within the string.
Please try this:
'normalize-space(.//a[@class="title"])'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With