Logo Questions Linux Laravel Mysql Ubuntu Git Menu

scrapy : How to scrape <ul> <li>

I'm learning how to scrape using scrapy api.

I would like to scrape the text into the <h2 class > and the the link into <a href > but it's not working (the attached file)

html page

I tried to extract the text in <a > tag

import scrapy

class PriceSpider(scrapy.Spider):
    name = "annonce"  #name of spider

    def start_requests(self):
        urls = [

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        for annonce in response.css('section.tabsContent li').extract():
                'title':annonce.css('a ::title').extract_first(),
like image 604
pradhox Avatar asked Sep 28 '17 15:09


1 Answers

Give this a try. Your css selector is heavily flawed.

import scrapy

class PriceSpider(scrapy.Spider):
    name = "annonce"  #name of spider

    def start_requests(self):
        urls = [

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        for annonce in response.css('.list_item'):

One more thing. Open your settings.py file and make it:

like image 135
SIM Avatar answered Oct 22 '22 09:10