I tried to override the user-agent of my crawlspider by adding an extra line to the project configuration file. Here is the code:
[settings] default = myproject.settings USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36" [deploy] #url = http://localhost:6800/ project = myproject
But when I run the crawler against my own web, I notice the spider did not pick up my customized user agent but the default one "Scrapy/0.18.2 (+http://scrapy.org)". Can any one explain what I have done wrong.
Note:
(1). It works when I tried to override the user agent globally:
scrapy crawl myproject.com -o output.csv -t csv -s USER_AGENT="Mozilla...."
(2). When I remove the line "default = myproject.setting" from the configuration file, and run scrapy crawl myproject.com, it says "cannot find spider..", so I feel like the default setting should not be removed in this case.
Thanks a lot for the help in advance.
While working with Scrapy, one needs to create scrapy project. In Scrapy, always try to create one spider which helps to fetch data, so to create one, move to spider folder and create one python file over there. Create one spider with name gfgfetch.py python file. Move to the spider folder and create gfgfetch.py .
Move your USER_AGENT line to the settings.py
file, and not in your scrapy.cfg
file. settings.py
should be at same level as items.py
if you use scrapy startproject
command, in your case it should be something like myproject/settings.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With