This is a quite lengthy post, but after doing extensive research I couldn't find a solution. I have a mixed Django 1.4.1 / Scrapy 0.14.4 project on OSX 10.8 and I control Scrapy with the Django project's manage.py
command as described here. For instance, calling
python manage.py scrapy crawl example_spider
works without a problem. Now I'm at the point where I want to setup the scrapyd
web service to deploy my spiders. However, when I execute
python manage.py scrapy server
then I get this exception:
scrapy.exceptions.NotConfigured: Unable to find scrapy.cfg file to infer project data dir
So, apparently Scrapy cannot find the scrapy.cfg
file because I don't execute it from within the Scrapy project. The other Scrapy commands work, however, because in my Django project's settings.py
I did the following:
sys.path.append('/absolute/path/to/my/Scrapy/project')
os.environ['SCRAPY_SETTINGS_MODULE'] = 'my_scrapy_project_name.settings'
Question 1: Why can't Scrapy detect the scrapy.cfg
file in my setup? How can I resolve this?
Since the stuff mentioned above doesn't work, I tried to get the scrapyd
server running using just the scrapy
command from within my Scrapy project directory. Executing scrapy server
from the top-level directory of my Scrapy project yields the following:
$ scrapy server
UserWarning: Cannot import scrapy settings module my_scrapy_project_name.settings
warnings.warn("Cannot import scrapy settings module %s" % scrapy_module)
2012-08-31 21:58:31+0200 [-] Log opened.
2012-08-31 21:58:32+0200 [-] Scrapyd web console available at http://localhost:6800/
2012-08-31 21:58:32+0200 [Launcher] Scrapyd started: max_proc=8, runner='scrapyd.runner'
2012-08-31 21:58:32+0200 [-] Site starting on 6800
2012-08-31 21:58:32+0200 [-] Starting factory <twisted.web.server.Site instance at 0x101dd3d88>
The server is running without a problem, however, the settings.py
file of my Scrapy project cannot be found because the respective environment variable is not set anymore. That's why I do the following in my terminal:
export PYTHONPATH=/absolute/path/to/my/Scrapy/project
export SCRAPY_SETTINGS_MODULE=my_scrapy_project_name.settings
Unfortunately, these two commands have no effect. Whenever I execute scrapy server
(or any other Scrapy command), I get the message that Scrapy cannot import its project's settings module.
My scrapy.cfg
only has the following content at the moment:
[settings]
default = my_scrapy_project_name.settings
[deploy:scrapyd]
url = http://localhost:6800/
project = my_scrapy_project_name
When I try to deploy my Scrapy project to the scrapyd
server, it seems to work at first, but then I realized that none of the spiders have been uploaded, probably because the settings file could not be detected. Here is the console output:
$ scrapy deploy scrapyd -p my_scrapy_project_name
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/scrapy/utils/project.py:17: UserWarning: Cannot import scrapy
settings module my_scrapy_project_name.settings
warnings.warn("Cannot import scrapy settings module %s" %
scrapy_module)
Building egg of event_crawler-1346531706
'build/lib' does not exist -- can't clean it
'build/bdist.macosx-10.6-intel' does not exist -- can't clean it
'build/scripts-2.7' does not exist -- can't clean it
zip_safe flag not set; analyzing archive contents...
Deploying event_crawler-1346531706 to http://localhost:6800/addversion.json
Server response (200):
{"status": "ok", "project": "my_scrapy_project_name", "version": "1346531706", "spiders": 0}
Question 2: How to do the correct export of the path and environment variable above so that this warning disappears?
Question 3: Since the scrapyd
server seems to work fine though, how can I upload my spiders correctly?
Many thanks in advance!
If you look at the code branch that raises this exception and the definition of the closest_scrapy_cfg
function that it calls, the only place scrapy is looking for your scrapy.cfg is in the dir you run the command from and any parent directory. You can maybe run os.chdir
in your manage.py, or move your scrapy.cfg to the directory you're running from.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With