Using normalize-space with Scrapy

Question

Below is a mock-up of a document I'm working on:

<div>
<h4>Area</h4>
  <span class="aclass"> </span>
  <span class="bclass">
        <strong>Address:</strong>
  10 Downing Street

  London

  SW1
  </span>
</div>

I'm getting the address like this:

response.xpath(u".//h4[. = 'Area']/following-sibling::span[contains(.,'Address:')]/text()").extract()

which returns

[u'
  	', u'
  10 Downing Street

  London     
  
  SW1
  ']

I'm trying to clean that up with normalize-space. I've tried putting it in every location I could think of, but it either tells me there's a syntax error, or returns an empty string.

Updating to add that I'm trying to get this working without changing the selector too much. I have similar cases which don't have the <strong> tag, for example. The selector is overcomplicated in the example I've prepared here, but in the live version, I have to take that rather convoluted route to get to the address.

Regarding the possible duplicate Following the advice in the possible duplicate, I added /normalize-space(.) giving this:

(u".//h4[. = 'Area']/following-sibling::span[contains(.,'Address:')]/text()/normalize-space(.)").extract()

That produces a ValueError: Invalid XPath: error.

alecxe · Accepted Answer

You can locate the strong element, get the following text sibling and normalize it:

In [1]: response.xpath(u"normalize-space(.//strong[. = 'Address:']/following-sibling::text())").extract()
Out[1]: [u'10 Downing Street London SW1']

Alternatively, you can look into Item Loaders and input and output processors. I often use Join(), TakeFirst() and MapCompose(unicode.strip) for cleaning up the extracted data from extra newlines or spaces.

eLRuLL · Answer

"normalize-space(//strong[contains(text(), 'Address:')]/following-sibling::node())"

Using normalize-space with Scrapy

Tags:

python

parsing

web-scraping

xpath

scrapy

user3185563

2 Answers

alecxe

eLRuLL

Recent Activity

Donate For Us

Using normalize-space with Scrapy

Tags:

python

parsing

web-scraping

xpath

scrapy

user3185563

2 Answers

alecxe

eLRuLL

Related questions

Recent Activity

Donate For Us