I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site.
How can I disable robots detection by Scrapy (ignored the existence)?
Thank you in advance.
I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'
In the settings.py file of your scrapy project, look for ROBOTSTXT_OBEY and set it to False.
If you run scrapy from project directory scrapy shell will use the projects settings.py. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set flag.
So to turn off ROBOTSTXT_OBEY setting you can simply:
scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With