Is it possible to locate elements by CSS properties in Scrapy?

Question

Am wondering if Scrapy has methods to scrape data based on their colors defined in CSS. For example, select all elements with background-color: #ff0000.

I have tried this:

response.css('td::attr(background-color)').extract()

I was expecting a list with all background colors set for the table data elements but it returns an empty list.

Is it generally possible to locate elements by their CSS properties in Scrapy?

alecxe · Accepted Answer

Short answer is No, this is not possible to do with Scrapy alone.

Why No?

the :attr() selector allows you to access element attributes, but background-color is a CSS property.
an important thing to understand now is that there are multiple different ways to define CSS properties of elements on a page and, to actually get a CSS property value of an element, you need a browser to fully render the page and all the defined stylesheets
Scrapy itself is not a browser, not a javascript engine, it is not able to render a page

Exceptions

Sometimes, though, CSS properties are defined in style attributes of the elements. For instance:

<span style="background-color: green"/>

If this is the case, when, yes, you would be able to use the style attributes value to filter elements:

response.xpath("//span[contains(@style, 'background-color: green')]")

This would though be quite fragile and may generate false positives.

What can you do?

look for other things to base your locators on. In general, strictly speaking, locating elements by the background color is not the best way to get to the desired elements unless, in some unusual circumstances, this property is the only distinguishing factor
scrapy-splash project allows you to automate a lightweight Splash browser which may render the page. In that case, you would need some Lua scripts to be executed to access CSS properties of elements on a rendered page
selenium browser automation tool is probably the most straightforward tool for this problem as it gives you direct control and access to the page and its elements and their properties and attributes. There is this .value_of_css_property() method to get a value of a CSS property.

Is it possible to locate elements by CSS properties in Scrapy?

Tags:

python

html

css

scrapy

user3445792

1 Answers

Why No?

Exceptions

What can you do?

alecxe

Recent Activity

Donate For Us

Is it possible to locate elements by CSS properties in Scrapy?

Tags:

python

html

css

scrapy

user3445792

1 Answers

Why No?

Exceptions

What can you do?

alecxe

Related questions

Recent Activity

Donate For Us