HXS filtering with scrapy - python

Question

I'm new in this sphere and i need more information. I couldn't find any information in the Internet. For example now now i use this function hxs.select('//div[@id="CategoryBreadcrumb"]//text()').extract() . In this div I have ul and lis with anchors in each li but one. I need the text from the li that doesn't have a tag in it. I'd be thankful if you give any educational links for hxs filtering as well. Thanks in advance! Here is an example if u cant visualize what i need.

<div id='CategoryBreadcrumb'>
<ul>
  <li><a href=#>I dont need</a></li>
  <li><a href=#>I dont need</a></li>
  <li><a href=#>I dont need</a></li>
  <li>Text that i need</li>
</ul>
</div>

unutbu · Accepted Answer

Try:

hxs.select('//div[@id = "CategoryBreadcrumb"]/ul/li/text()')

To learn more about XPaths see w3schools for the basics, and w3.org for the full specification.

PS: scrapy uses lxml. You can test your XPaths using code like this:

import lxml.html as LH

text = '''
<div id='CategoryBreadcrumb'>
<ul>
  <li><a href=#>I dont need</a></li>
  <li><a href=#>I dont need</a></li>
  <li><a href=#>I dont need</a></li>
  <li>Text that i need</li>
</ul>
</div>
'''

doc = LH.fromstring(text)
print(doc.xpath('//div[@id = "CategoryBreadcrumb"]/ul/li/text()'))

# ['Text that i need']

HXS filtering with scrapy - python

Tags:

python

select

filter

scrapy

Martin

1 Answers

unutbu

Recent Activity

Donate For Us

HXS filtering with scrapy - python

Tags:

python

select

filter

scrapy

Martin

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us