How to get innerHTML of a node using scrapy Selector?

Question

Suppose there are some html fragments like:

<a>
   text in a
   <b>text in b</b>
   <c>text in c</c>
</a>
<a>
   <b>text in b</b>
   text in a
   <c>text in c</c>
</a>

In which I want to extract texts within tag but excluding those tags while keeping their text, for instance, the content I want to extract above would be like "text in a text in b text in c" and "text in b text in a text inc". Now I could get the nodes using scrapy Selector css() function, then how could I proceed these nodes to get what I want? Any idea would be appreciated, thank you!

Cristian Lupascu · Accepted Answer

Here's what I managed to do:

from scrapy.selector import Selector

sel = Selector(text = html_string)

for node in sel.css('a *::text'):
    print node.extract()

Assuming that html_string is a variable holding the html in your question, this code produces the following output:

   text in a

text in b


text in c




text in b

   text in a

text in c

The selector a *::text() matches all the text nodes which are descendents of a nodes.

Awais Asghar · Answer

try this

response.xpath('//a/node()').extract()

paul trmbrth · Answer

You can use XPath's string() function on the elements you select:

$ python
>>> import scrapy
>>> selector = scrapy.Selector(text="""<a>
...    text in a
...    <b>text in b</b>
...    <c>text in c</c>
... </a>
... <a>
...    <b>text in b</b>
...    text in a
...    <c>text in c</c>
... </a>""", type="html")
>>> for link in selector.css('a'):
...     print link.xpath('string(.)').extract()
... 
[u'
   text in a
   text in b
   text in c
']
[u'
   text in b
   text in a
   text in c
']
>>>

How to get innerHTML of a node using scrapy Selector?

Tags:

python

html

css-selectors

xpath

scrapy

kuixiong

3 Answers

Cristian Lupascu

Awais Asghar

paul trmbrth

Recent Activity

Donate For Us

How to get innerHTML of a node using scrapy Selector?

Tags:

python

html

css-selectors

xpath

scrapy

kuixiong

3 Answers

Cristian Lupascu

Awais Asghar

paul trmbrth

Related questions

Recent Activity

Donate For Us