I'm learning how to scrape using scrapy api.
I would like to scrape the text into the <h2 class > and the the link into <a href > but it's not working (the attached file)

I tried to extract the text in <a > tag
import scrapy
class PriceSpider(scrapy.Spider):
    name = "annonce"  #name of spider
    def start_requests(self):
        urls = [
            'https://www.leboncoin.fr/ventes_immobilieres/offres/ile_de_france/?th=1',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)
    def parse(self, response):
        for annonce in response.css('section.tabsContent li').extract():
            yield{
                'title':annonce.css('a ::title').extract_first(),
                }
Give this a try. Your css selector is heavily flawed.
import scrapy
class PriceSpider(scrapy.Spider):
    name = "annonce"  #name of spider
    def start_requests(self):
        urls = [
            'https://www.leboncoin.fr/ventes_immobilieres/offres/ile_de_france/?th=1',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)
    def parse(self, response):
        for annonce in response.css('.list_item'):
            yield{
                'link':annonce.css('::attr(href)').extract_first(),
                'title':annonce.css('.item_title::text').extract_first().strip(),
                }
One more thing. Open your settings.py file and make it:
ROBOTSTXT_OBEY = False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With