I am trying to publish a very simple scrapy spider as an .exe using pyinstaller. I have searched and read everything i could find but i still cant figure out what is going wrong. Any help or pointers in the right direction are very much appriciated!
If i change the yield to return it doesn't give me the error and works except for it only returning 1 item (which is normal since it is a return and not a yield.) the code works just fine without any errors in my IDE (not using the pyinstaller .exe)
Note: I am using pyinstaller dev version.
Error when running my .exe
2020-04-28 11:57:30 [scrapy.core.scraper] ERROR: Spider error processing <GET http://books.toscrape.com/> (referer: None)
Traceback (most recent call last):
File "lib\site-packages\twisted\internet\defer.py", line 1418, in _inlineCallbacks
File "lib\site-packages\scrapy\core\downloader\middleware.py", line 42, in process_request
File "lib\site-packages\twisted\internet\defer.py", line 1362, in returnValue
twisted.internet.defer._DefGen_Return: <200 http://books.toscrape.com/>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "lib\site-packages\scrapy\utils\defer.py", line 55, in mustbe_deferred
File "lib\site-packages\scrapy\core\spidermw.py", line 60, in process_spider_input
File "lib\site-packages\scrapy\core\scraper.py", line 148, in call_spider
File "lib\site-packages\scrapy\utils\misc.py", line 202, in warn_on_generator_with_return_value
File "lib\site-packages\scrapy\utils\misc.py", line 187, in is_generator_with_return_value
File "inspect.py", line 973, in getsource
File "inspect.py", line 955, in getsourcelines
File "inspect.py", line 786, in findsource
OSError: could not get source code
myBookSpider.py:
import scrapy
from items import scrapyStandaloneTestItem
class bookSpider(scrapy.Spider):
name = "bookSpider"
custom_settings = {
"FEED_URI" : "resultFile.csv",
"FEED_FORMAT" : "csv",
"FEED_EXPORT_FIELDS" : ["title", "price"]
}
def start_requests(self):
urls = [
"http://books.toscrape.com/",
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
# Getting an instance of our item class
item = scrapyStandaloneTestItem()
# Getting all the article's with product pod class
articles = response.css("article.product_pod")
# Looping thru all the article elements we got earlier
for article in articles:
# Getting the needed values from the site and putting them in variables
title = article.css("a::attr(title)").extract()
price = article.css("p.price_color::text").extract()
# Setting the title / price variables in our items class equal to the variables that we just extracted data in to
item["title"] = title
item["price"] = price
yield item
items.py:
import scrapy
class scrapyStandaloneTestItem(scrapy.Item):
# define the fields for your item here
title = scrapy.Field()
price = scrapy.Field()
runSpider.py:
# In this file we will run the spider(s)
from scrapy.crawler import CrawlerProcess
from myBookSpider import bookSpider
from scrapy.utils.project import get_project_settings
def runSpider():
# Running scraper
process = CrawlerProcess(get_project_settings())
process.crawl(bookSpider)
process.start()
if (__name__ == "__main__"):
runSpider()
Late answer but I'll keep it here for the others, all you have to do is to add this code into your spider, ` import scrapy.utils.misc import scrapy.core.scraper
def warn_on_generator_with_return_value_stub(spider, callable):
pass
scrapy.utils.misc.warn_on_generator_with_return_value = warn_on_generator_with_return_value_stub
scrapy.core.scraper.warn_on_generator_with_return_value = warn_on_generator_with_return_value_stub`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With