Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is my scrapy spider not following the Request callback in my item parse function?

Tags:

python

scrapy

I'm scraping a site to check in-stock status of various products. Unfortunately this requires actually clicking "Add to Cart" on the product page and checking the next page's message to determine if stock is available (i.e. it requires parsing two responses).

I followed the excellent documentation for this scenario and wrote my parse function to return a Request object with a callback to my secondary parse function. However, this function rarely gets called. Most products result in only seeing "Before return request" show up in the log, but it does get called properly for a small portion of products.

Any clue what is going wrong here? I've ran out of ideas.

foo/spiders/atlantic_firearms_spider.py:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.http import FormRequest
from foo.items import AtlanticFirearmsItem

import datetime
import re

class AtlanticFirearmsSpider(CrawlSpider):
    name = "atlantic_firearms"
    allowed_domains = ["atlanticfirearms.com"]
    start_urls = [
        "http://www.atlanticfirearms.com"
    ]

    rules = (
        Rule(SgmlLinkExtractor(allow=['detail.html']), callback='parse_product'),
        Rule(SgmlLinkExtractor(allow=[], deny=['/bro', '/news', '/howtobuy', '/component/search', 'askquestion'])),
    )

    def parse_product(self, response):
      hxs = HtmlXPathSelector(response)
      product = AtlanticFirearmsItem()
      add_to_cart = any([hxs.select("descendant-or-self::input[@name = 'addtocart']"),
                         hxs.select("descendant-or-self::input[@value = 'Add to Cart']"),
                         hxs.select("//a[text() = 'Add to Cart']")])
      product['url'] = response.url
      product['as_of_time'] = datetime.datetime.now()

      if add_to_cart:
          # attempt to add to cart to verify availability
          request = FormRequest.from_response(response, formname="addtocartForm", callback=self.parse_add_to_cart)
          request.meta['product'] = product
          print "Before return request"
          return request
      else:
          product['in_stock'] = False
          return product

    def parse_add_to_cart(self, response):
        print "Inside parse_add_to_cart"
        product = response.meta['product']
        hxs = HtmlXPathSelector(response)
        product['in_stock'] = not(hxs.select("//text()[contains(.,'We regret to inform you that this product')]"))
        return product

foo/items.py:

from scrapy.item import Item, Field

class AtlanticFirearmsItem(Item):
    in_stock = Field()
    url = Field()
    as_of_time = Field()

Edit: adding log file as requested:

2013-09-21 07:25:14-0500 [scrapy] INFO: Scrapy 0.18.2 started (bot: foo)
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Optional features available: ssl, http11
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Overridden settings: {'SPIDER_MODULES': ['foo.spiders'], 'BOT_NAME': 'foo'}
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRef
reshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Spider opened
2013-09-21 07:25:14-0500 [atlantic_firearms] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2013-09-21 07:25:14-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com> (referer: None)
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cloudflare.com': <GET http://www.cloudflare.com/email-protection>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.constantcontact.com': <GET http://www.constantcontact.com/jmml/email-marketing.jsp>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.fdicreative.com': <GET http://www.fdicreative.com/>
2013-09-21 07:25:16-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redjacketfirearms.com': <GET https://www.redjacketfirearms.com/>
2013-09-21 07:25:17-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-ammunition-45acp-500-round-case-detail.
html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.h
tml?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Filtered duplicate request: <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pisto
l-9mm-detail.html?Itemid=0> - no more duplicates will be shown (see DUPEFILTER_CLASS)
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/wolf-223-ar15-rifle-ammo-500-round-case-deta
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:18-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/us-palm-air-save-plate-carrier-detail.html?I
temid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/545-x-39-russian-ak74-ammo-1080-round-case-d
etail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/red-army-standard-7-62x39mm-360-round-range-
pack-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-mp5-style-rifle-detail.html?Itemid=0> (
referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:19-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/wolf-ammunition-for-sale-ak47-detail.html?Item
id=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/dsa-zm4-flat-top-ar15-carbine-dszm4cv1r-detail.html
?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:20-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-accessories/m92-ak47-yugoslavian-7-62x39mm-bolt-hold-open-
metal-mags-pack-of-two-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/vector-arms-v94-9mm-mp5-style-pistol-full-size-deta
il.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/zastava-ak-47-m70b1-pap-7-62x39mm-rifles-w-2-hi-cap
-mags-detail.html?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ptr-91-gi-rifle-939-atlanticfirearms-com-detail.htm
l?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:21-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/pap-m92-7-62x39-pistol-detail.html?Itemid=0> (refer
er: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/content/article/86-static-pages/159-resources.html> (referer: http://www.atlan
ticfirearms.com)
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atsconsultingcorp.com': <GET http://www.atsconsultingcorp.com/>                                  [52/1905]
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bullseyemarket.com': <GET http://www.bullseyemarket.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.corilam.com': <GET http://www.corilam.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'chancebrownrealestate.com': <GET http://chancebrownrealestate.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.delsolservices.com': <GET http://www.delsolservices.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.elkhornoutfitters.com': <GET http://www.elkhornoutfitters.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.frontierlogistics.com': <GET http://www.frontierlogistics.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gpstrackingkey.com': <GET http://www.gpstrackingkey.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.hanshawkennedy.com': <GET http://www.hanshawkennedy.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'worldenv.com': <GET http://worldenv.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'purgexonline.com': <GET http://purgexonline.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bumpfirestocks.com': <GET http://bumpfirestocks.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texrestaurantequipment.com': <GET http://www.texrestaurantequipment.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.houston-refinance.com': <GET http://www.houston-refinance.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'johnson-bryan.com': <GET http://johnson-bryan.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'kanesforms.com': <GET http://kanesforms.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.markfoxrealestate.com': <GET http://www.markfoxrealestate.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.mphoa.org': <GET http://www.mphoa.org/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.outfitterwebsites.com': <GET http://www.outfitterwebsites.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'outdoortrailsnetwork.com': <GET http://outdoortrailsnetwork.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.psychologicalriskservices.com': <GET http://www.psychologicalriskservices.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rcshouston.com': <GET http://www.rcshouston.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rollingcreekcarwash.com': <GET http://www.rollingcreekcarwash.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'slammc.com': <GET http://slammc.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.texassaltwaterfishingguide.com': <GET http://www.texassaltwaterfishingguide.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.waynepigment.com': <GET http://www.waynepigment.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'bancroftfeldman.com': <GET http://bancroftfeldman.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'elilanddesign.com': <GET http://elilanddesign.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpharms.com': <GET http://dpharms.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'contractlandstaff.com': <GET http://contractlandstaff.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'knightsplumbing.com': <GET http://knightsplumbing.com/>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/component/virtuemart/shipping-rifles/ati-omni-5-56-poly-competition-m4-carbine-detail.ht
ml?Itemid=0> (referer: http://www.atlanticfirearms.com)
Before return request
2013-09-21 07:25:22-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/featured-not-published/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-accessories/index.php>
2013-09-21 07:25:23-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/dallas-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-accessories/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:24-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/component/content/?Itemid=803&id=148> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/houston-texas-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/california-gun-shop.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Redirecting (303) to <GET http://www.atlanticfirearms.com/browse-our-products.html> from <POST http://www.atlanticfirearms.com/component/vi
rtuemart/shipping-rifles/index.php>
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/browse-our-products.html> (referer: http://www.atlanticfirearms.com/component/virtuemart
/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0)
Inside parse_add_to_cart
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Scraped from <200 http://www.atlanticfirearms.com/browse-our-products.html>
        {'as_of_time': datetime.datetime(2013, 9, 21, 7, 25, 18, 365559),
         'in_stock': True,
         'url': 'http://www.atlanticfirearms.com/component/virtuemart/featured-not-published/vector-arms-sp89-k-style-pistol-9mm-detail.html?Itemid=0'}
2013-09-21 07:25:25-0500 [atlantic_firearms] DEBUG: Crawled (404) <GET http://www.atlanticfirearms.com/www.atlanticfirearms.com> (referer: http://www.atlanticfirearms.com/dallas-gun-shop.html
)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/login-or-register/editaddress.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/privacy-policy.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/subscribe.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Crawled (200) <GET http://www.atlanticfirearms.com/links.html> (referer: http://www.atlanticfirearms.com)
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunbroker.com': <GET http://www.gunbroker.com/user/dealernetwork.asp>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.auctionarms.com': <GET http://www.auctionarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.gunsamerica.com': <GET http://www.gunsamerica.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ar15.com': <GET http://www.ar15.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.olyarms.com': <GET http://www.olyarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.cheaperthandirt.com': <GET http://www.cheaperthandirt.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ammoman.com': <GET http://www.ammoman.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.net': <GET http://www.ak47.net/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.atf.treas.gov': <GET http://www.atf.treas.gov/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'caag.state.ca.us': <GET http://caag.state.ca.us/firearms/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.nra.org': <GET http://www.nra.org/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.masterpiecearms.com': <GET http://www.masterpiecearms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'atlantic1.readyhosting.com': <GET http://atlantic1.readyhosting.com/programming/listview.asp?CatId=2>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vulcanarmament.com': <GET http://www.vulcanarmament.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.bushmaster.com': <GET http://www.bushmaster.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.rockriverarms.com': <GET http://www.rockriverarms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'dpmsinc.com': <GET http://dpmsinc.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.colt.com': <GET http://www.colt.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.armalite.com': <GET http://www.armalite.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.redstick-firearms.com': <GET http://www.redstick-firearms.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.vectorarms.com': <GET http://www.vectorarms.com/indexframe.html>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.arsenalinc.com': <GET http://www.arsenalinc.com/about.htm>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.ak47.com': <GET http://www.ak47.com/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.jldenter.com': <GET http://www.jldenter.com/store/>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.springfield-armory.com': <GET http://www.springfield-armory.com/index.shtml>
2013-09-21 07:25:26-0500 [atlantic_firearms] DEBUG: Filtered offsite request to 'www.dsarms.com': <GET http://www.dsarms.com/>
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT, shutting down gracefully. Send again to force 
^C2013-09-21 07:25:26-0500 [scrapy] INFO: Received SIGINT twice, forcing unclean shutdown
like image 988
Abe Voelker Avatar asked Sep 21 '13 02:09

Abe Voelker


1 Answers

Posting my earlier comment as an answer.

As all your POST requests (coming from FormRequest.from_response()) get redirected to http://www.atlanticfirearms.com/browse-our-products.html, you should set dont_filter=True:

    if add_to_cart:
        # attempt to add to cart to verify availability
        request = FormRequest.from_response(response, formname="addtocartForm",
                      callback=self.parse_add_to_cart, dont_filter=True)

See Scrapy docs on requests:

dont_filter (boolean) – indicates that this request should not be filtered by the scheduler. This is used when you want to perform an identical request multiple times, to ignore the duplicates filter.

Additionally, you may want to set CONCURRENT_REQUESTS = 1 to add items 1-by-1 in the cart (I wonder how the server handles parallel cart additions.)

like image 123
paul trmbrth Avatar answered Nov 08 '22 23:11

paul trmbrth