Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting Forbidden by robots.txt: scrapy

while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>

ERROR: No response downloaded for: https://www.netflix.com/

like image 576
deepak kumar Avatar asked May 17 '16 11:05

deepak kumar


2 Answers

In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py with ROBOTSTXT_OBEY

ROBOTSTXT_OBEY = False 

Here are the release notes

like image 172
Rafael Almeida Avatar answered Sep 21 '22 18:09

Rafael Almeida


First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.

like image 36
Ketan Patel Avatar answered Sep 21 '22 18:09

Ketan Patel