I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site.
How can I disable robots detection by Scrapy (ignored the existence)?
Thank you in advance.
I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'
In the settings.py file of your scrapy project, look for ROBOTSTXT_OBEY and set it to False.
If you run scrapy from project directory scrapy shell
will use the projects settings.py
. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set
flag.
So to turn off ROBOTSTXT_OBEY
setting you can simply:
scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With