Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cannot import either Scrapy's settings module or its scrapy.cfg

This is a quite lengthy post, but after doing extensive research I couldn't find a solution. I have a mixed Django 1.4.1 / Scrapy 0.14.4 project on OSX 10.8 and I control Scrapy with the Django project's manage.py command as described here. For instance, calling

python manage.py scrapy crawl example_spider 

works without a problem. Now I'm at the point where I want to setup the scrapyd web service to deploy my spiders. However, when I execute

python manage.py scrapy server

then I get this exception:

scrapy.exceptions.NotConfigured: Unable to find scrapy.cfg file to infer project data dir

So, apparently Scrapy cannot find the scrapy.cfg file because I don't execute it from within the Scrapy project. The other Scrapy commands work, however, because in my Django project's settings.py I did the following:

sys.path.append('/absolute/path/to/my/Scrapy/project')
os.environ['SCRAPY_SETTINGS_MODULE'] = 'my_scrapy_project_name.settings'

Question 1: Why can't Scrapy detect the scrapy.cfg file in my setup? How can I resolve this?


Since the stuff mentioned above doesn't work, I tried to get the scrapyd server running using just the scrapy command from within my Scrapy project directory. Executing scrapy server from the top-level directory of my Scrapy project yields the following:

$ scrapy server
UserWarning: Cannot import scrapy settings module my_scrapy_project_name.settings
warnings.warn("Cannot import scrapy settings module %s" % scrapy_module)
2012-08-31 21:58:31+0200 [-] Log opened.
2012-08-31 21:58:32+0200 [-] Scrapyd web console available at http://localhost:6800/
2012-08-31 21:58:32+0200 [Launcher] Scrapyd started: max_proc=8, runner='scrapyd.runner'
2012-08-31 21:58:32+0200 [-] Site starting on 6800
2012-08-31 21:58:32+0200 [-] Starting factory <twisted.web.server.Site instance at 0x101dd3d88> 

The server is running without a problem, however, the settings.py file of my Scrapy project cannot be found because the respective environment variable is not set anymore. That's why I do the following in my terminal:

export PYTHONPATH=/absolute/path/to/my/Scrapy/project
export SCRAPY_SETTINGS_MODULE=my_scrapy_project_name.settings

Unfortunately, these two commands have no effect. Whenever I execute scrapy server (or any other Scrapy command), I get the message that Scrapy cannot import its project's settings module.

My scrapy.cfg only has the following content at the moment:

[settings]
default = my_scrapy_project_name.settings

[deploy:scrapyd]
url = http://localhost:6800/
project = my_scrapy_project_name

When I try to deploy my Scrapy project to the scrapyd server, it seems to work at first, but then I realized that none of the spiders have been uploaded, probably because the settings file could not be detected. Here is the console output:

$ scrapy deploy scrapyd -p my_scrapy_project_name
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/scrapy/utils/project.py:17: UserWarning: Cannot import scrapy
settings module my_scrapy_project_name.settings
 warnings.warn("Cannot import scrapy settings module %s" %
scrapy_module)
Building egg of event_crawler-1346531706
'build/lib' does not exist -- can't clean it
'build/bdist.macosx-10.6-intel' does not exist -- can't clean it
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
Deploying event_crawler-1346531706 to http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "my_scrapy_project_name", "version": "1346531706", "spiders": 0}

Question 2: How to do the correct export of the path and environment variable above so that this warning disappears?

Question 3: Since the scrapyd server seems to work fine though, how can I upload my spiders correctly?

Many thanks in advance!

like image 540
pemistahl Avatar asked Aug 31 '12 20:08

pemistahl


1 Answers

If you look at the code branch that raises this exception and the definition of the closest_scrapy_cfg function that it calls, the only place scrapy is looking for your scrapy.cfg is in the dir you run the command from and any parent directory. You can maybe run os.chdir in your manage.py, or move your scrapy.cfg to the directory you're running from.

like image 133
Mu Mind Avatar answered Oct 13 '22 01:10

Mu Mind