Scrapy grab div with multiple classes?

Question

I am trying to grab div's with the class: 'product'. The problem is, some of the div's with class 'product' also have the class 'product-small'. So when I use xpath('//div[@class='product']'), it only captures the divs with one class and not multiple. How can I do this with scrapy?

Example:

Catches: <div class='product'>
Doesn't catch: <div class='product product-small'>

Example:

Catches: <div class='product'>
Doesn't catch: <div class='product product-small'>

alecxe · Accepted Answer

This could be also solved with xpath. You just needed to use contains():

//div[contains(concat(' ', normalize-space(@class), ' '), ' product ')]

Though, yes, the CSS selector option is more compact and readable.

spirulence · Answer

You should consider using a CSS selector for this part of your query.

http://doc.scrapy.org/en/latest/topics/selectors.html#when-querying-by-class-consider-using-css

from scrapy import Selector
sel = Selector(text='<div class="product product-small">I am a product!</div>')
print sel.css('.product').extract()

If you need to, you can chain CSS and XPath selectors, as in the example on that page.

Scrapy grab div with multiple classes?

Tags:

python

html

web-scraping

xpath

scrapy

user1835351

2 Answers

alecxe

spirulence

Recent Activity

Donate For Us

Scrapy grab div with multiple classes?

Tags:

python

html

web-scraping

xpath

scrapy

user1835351

2 Answers

alecxe

spirulence

Related questions

Recent Activity

Donate For Us