For the following code: <pre class="prettyprint"><code><a class="title" href="the link"> Low price computer you should not miss </a> </code></pre> I used this xpath code to scrapy: <pre class="prettyprint"><code>response.xpath('.//a[@class="title"]//text()[normalize-space()]').extract() </code></pre> I got the following result: <pre class="prettyprint"><code>u'\n \n Low price ', u'computer', u' you should not miss' </code></pre> Why two <code>\n</code> and many empty spaces before <code>low price</code> was not removed by <code>normalize-space()</code> for this example? Another question: how to combine the 3 parts as one scraped item as <code>u'Low price computer you should not miss'</code>?

Please try this: <pre class="prettyprint"><code>'normalize-space(.//a[@class="title"])' </code></pre>

Xpath: why normalize-space could not remove the empty space and \n?

Tags:

xpath

scrapy-spider

For the following code:

<a class="title" href="the link">
Low price
<strong>computer</strong>
you should not miss
</a>

I used this xpath code to scrapy:

response.xpath('.//a[@class="title"]//text()[normalize-space()]').extract()

I got the following result:

u'\n                  \n                  Low price ', u'computer', u' you should not miss'

Why two \n and many empty spaces before low price was not removed by normalize-space() for this example?

Another question: how to combine the 3 parts as one scraped item as u'Low price computer you should not miss'?

347

asked Oct 13 '15 06:10

LearnAWK

1 Answers

Please try this:

'normalize-space(.//a[@class="title"])'

192

answered Sep 28 '22 16:09

Alexander Petrov

Related questions
                            
                                Using XPath to get text of paragraph with links inside
                            
                                XmlDocument.SelectSingleNode and prefix + xmlNamespace issue
                            
                                XPath: Get root node of a node-set from a specified node
                            
                                Javascript Xpath and default namespaces
                            
                                Select adjacent sibling elements without intervening non-whitespace text nodes
                            
                                Find an element that only has one other kind of child
                            
                                Can I select in XPath based on a variable attribute?
                            
                                Get the first ancestor with Xpath
                            
                                scala - XML insert/update
                            
                                Get Xpath dynamically using ElementTree getpath()
                            
                                selenium xpath - how to select the last row of a table with certain content?
                            
                                R: How to get parent attributes and node values at the site time?
                            
                                Selenium in Python: Select the second element with given link text
                            
                                Get parent node from XPATH descendant value
                            
                                Selenium CSS selector :visible is not a valid selector
                            
                                xpath logical operator precedence of AND and OR, without parenthesis
                            
                                How can I use Nokogiri with Ruby to replace values in existing xml?
                            
                                How to use substring() with Import.io?
                            
                                XPath: Select following siblings until certain class
                            
                                Removing nodes in XML with XPath and PHP