Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"AttributeError: 'SelectorList' object has no attribute 'get'" in Scrapy Cloud

Tags:

python

scrapy

I'm setting a scraper with Scrapy that works well on my laptop. But this message appers when I try this same spider on scrapy cloud:

File "/usr/local/lib/python2.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/tmp/unpacked-eggs/__main__.egg/ccv_spiders/spiders/gmitem.py", line 31, in parse
    data["marque"] = caritem.css("div.make::text").get().strip().split(" ", 2)[1]
AttributeError: 'SelectorList' object has no attribute 'get'

Here is my code:

def start_requests(self):
        for item in self.data:
            request = scrapy.Request(item['gm_url'], callback=self.parse)
            request.meta['item'] = item
            yield request

    def parse(self, response):
        item = response.meta['item']
        item['results'] = []

        for caritem in response.css("div.car-item-border"):
            data = AuctionItem()
            urllot = "https://www.website.com/img/auctions/byLot/"
            urlbase = "https://www.website.com/img/auctions/car/thumb/"
            data["marque"] = caritem.css("div.make::text").get().strip().split(" ", 2)[1]
            data["model"] = caritem.css("div.make::text").get().strip().split(" ", 2)[2]
            data["model_year"] = caritem.css("div.make::text").get().strip().split(" ", 1)[0]
            data["price_str"] = caritem.css("div.price::text").get().strip().replace(",", " ")
            if caritem.css("div.price::text").get().find("Estimate"):
                data["sold"] = True
            else:
                data["sold"] = False
            data["auction_house"] = caritem.css("div.auctionHouse::text").get().split("-", 1)[0].strip()
            data["auction_country"] = caritem.css("div.auctionHouse::text").get().rsplit(",", 1)[1].strip()
            data["auction_date"] = caritem.css("div.date::text").get().replace(",", "").strip()
            if caritem.css("div.view-auction a::attr(href)").get().find("/auction-cars/show-backup-image"):
                data["auction_url"] = caritem.css("div.view-auction a::attr(href)").get()
            else:
                data["auction_url"] = None
            data["image_urls"] = caritem.css("img.img-responsive::attr(src)").get()
            if urllot in data["image_urls"]:
                data["image_cloud"] = caritem.css("img.img-responsive::attr(src)").get().replace(urllot,"https://res.cloudinary.com/ccv/image/upload/auctions/")
                data["image_cloud"] = re.sub(r"(?<=[A-Z])/*(?=\d)", "-", data["image_cloud"])
            elif urlbase in data["image_urls"]:
                data["image_cloud"] = caritem.css("img.img-responsive::attr(src)").get().replace(urlbase, "https://res.cloudinary.com/ccv/image/upload/auctions/")
            item['results'].append(data)

        yield item

Is there any problem with my Python version ? It works with Anaconda and Python 3 on my laptop and I don't understand why it seems to be using python 2.7 via "/usr/local/lib/python2.7..."

Besides, my JSON output doesn't show any of the results's arrays.

like image 571
lf_celine Avatar asked Jan 27 '23 12:01

lf_celine


1 Answers

It is all about libraries' version.

Both get and getall methods were first introduced by Parsel (Scrapy's parse library) at version 1.2.0, which is not granted if you're using Scrapy 1.5.2 or lower.

You can use extract_first and extract as replacements, or upgrade Scrapy to 1.6+.

like image 81
Thiago Curvelo Avatar answered Jan 29 '23 02:01

Thiago Curvelo