Am wondering if Scrapy has methods to scrape data based on their colors defined in CSS. For example, select all elements with background-color: #ff0000
.
I have tried this:
response.css('td::attr(background-color)').extract()
I was expecting a list with all background colors set for the table data elements but it returns an empty list.
Is it generally possible to locate elements by their CSS properties in Scrapy?
Short answer is No, this is not possible to do with Scrapy alone.
the :attr()
selector allows you to access element attributes, but background-color
is a CSS property.
an important thing to understand now is that there are multiple different ways to define CSS properties of elements on a page and, to actually get a CSS property value of an element, you need a browser to fully render the page and all the defined stylesheets
Scrapy itself is not a browser, not a javascript engine, it is not able to render a page
Sometimes, though, CSS properties are defined in style
attributes of the elements. For instance:
<span style="background-color: green"/>
If this is the case, when, yes, you would be able to use the style
attributes value to filter elements:
response.xpath("//span[contains(@style, 'background-color: green')]")
This would though be quite fragile and may generate false positives.
scrapy-splash
project allows you to automate a lightweight Splash browser which may render the page. In that case, you would need some Lua scripts to be executed to access CSS properties of elements on a rendered pageselenium
browser automation tool is probably the most straightforward tool for this problem as it gives you direct control and access to the page and its elements and their properties and attributes. There is this .value_of_css_property()
method to get a value of a CSS property.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With