xpath splitting string by
tags

Question

I am having a problem with python and the Scrappy library. When this code:

self.item['char_SP4_TIP'] = response.xpath('//p[contains(@class, "spell-tooltip")]/text()').extract()

runs, it extracts the text from the paragraph but it splits it by the  tags.

So instead of being able to access it like: self.item['char_SP4_TIP'][0], I have to access [0][1][2] etc.. for however many   tags there are. Is there any way to fix it so it does not split it by the   tags? Thanks.

parchment · Accepted Answer

Your xpath selects all text nodes, but a   is not a text node.

<p class='spell-description'> blah <br><br> blah2 </p>
                Selects these ^^^^          ^^^^^

You can join the split text.

texts = response.xpath('//p[contains(@class, "spell-tooltip")]/text()').extract()
text = '
'.join(texts)

If there are multiple  tags with that class:

text = ['
'.join(p.xpath('/text()').extract()) 
           for p in response.xpath('//p[contains(@class, "spell-tooltip")]')]

xpath splitting string by <br> tags

Tags:

python

web-scraping

scrapy

user3558177

1 Answers

parchment

Recent Activity

Donate For Us