Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Xpath: why normalize-space could not remove the empty space and \n?

For the following code:

<a class="title" href="the link">
Low price
<strong>computer</strong>
you should not miss
</a>

I used this xpath code to scrapy:

response.xpath('.//a[@class="title"]//text()[normalize-space()]').extract()

I got the following result:

u'\n                  \n                  Low price ', u'computer', u' you should not miss'

Why two \n and many empty spaces before low price was not removed by normalize-space() for this example?

Another question: how to combine the 3 parts as one scraped item as u'Low price computer you should not miss'?

like image 347
LearnAWK Avatar asked Oct 13 '15 06:10

LearnAWK


People also ask

How does XPath handle space?

If an element has spaces in its text or in the value of any attribute, then to create an xpath for such an element we have to use the normalize-space function. It removes all the trailing and leading spaces from the string. It also removes every new tab or lines existing within the string.


1 Answers

Please try this:

'normalize-space(.//a[@class="title"])'
like image 192
Alexander Petrov Avatar answered Sep 28 '22 16:09

Alexander Petrov