Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to disable robots.txt when you launch scrapy shell?

I use Scrapy shell without problems with several websites, but I find problems when the robots (robots.txt) does not allow access to a site. How can I disable robots detection by Scrapy (ignored the existence)? Thank you in advance. I'm not talking about the project created by Scrapy, but Scrapy shell command: scrapy shell 'www.example.com'

like image 1000
DARDAR SAAD Avatar asked Nov 26 '16 21:11

DARDAR SAAD


2 Answers

In the settings.py file of your scrapy project, look for ROBOTSTXT_OBEY and set it to False.

like image 143
daniboy000 Avatar answered Sep 28 '22 08:09

daniboy000


If you run scrapy from project directory scrapy shell will use the projects settings.py. If you run outside of the project scrapy will use default settings. However you can override and add settings via --set flag.
So to turn off ROBOTSTXT_OBEY setting you can simply:

scrapy shell http://stackoverflow.com --set="ROBOTSTXT_OBEY=False"
like image 45
Granitosaurus Avatar answered Sep 28 '22 10:09

Granitosaurus