getting Forbidden by robots.txt: scrapy

Question

while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>

ERROR: No response downloaded for: https://www.netflix.com/

Rafael Almeida · Accepted Answer

In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py with ROBOTSTXT_OBEY

ROBOTSTXT_OBEY = False

Here are the release notes

Ketan Patel · Answer

First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.

getting Forbidden by robots.txt: scrapy

Tags:

python

scrapy

web-crawler

deepak kumar

2 Answers

Rafael Almeida

Ketan Patel

Recent Activity

Donate For Us

getting Forbidden by robots.txt: scrapy

Tags:

python

scrapy

web-crawler

deepak kumar

2 Answers

Rafael Almeida

Ketan Patel

Related questions

Recent Activity

Donate For Us