Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get href using css selector with Scrapy

I want to get the href value:

<span class="title">   <a href="https://www.example.com"></a> </span> 

I tried this:

Link = Link1.css('span[class=title] a::text').extract()[0] 

But I just get the text inside the <a>. How can I get the link inside the href?

like image 202
Marco Dinatsoli Avatar asked Jan 17 '14 08:01

Marco Dinatsoli


People also ask

What is CSS selector in Scrapy?

CSS is a language for applying styles to HTML documents. It defines selectors to associate those styles with specific HTML elements. Scrapy Selectors is a thin wrapper around parsel library; the purpose of this wrapper is to provide better integration with Scrapy Response objects.


1 Answers

What you're looking for is:

Link = Link1.css('span[class=title] a::attr(href)').extract()[0] 

Since you're matching a span "class" attribute also, you can even write

Link = Link1.css('span.title a::attr(href)').extract()[0] 

Please note that ::text pseudo element and ::attr(attributename) functional pseudo element are NOT standard CSS3 selectors. They're extensions to CSS selectors in Scrapy 0.20.


Edit (2017-07-20): starting from Scrapy 1.0, you can use .extract_first() instead of .extract()[0]

Link = Link1.css('span[class=title] a::attr(href)').extract_first() Link = Link1.css('span.title a::attr(href)').extract_first() 
like image 87
paul trmbrth Avatar answered Oct 02 '22 17:10

paul trmbrth