while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>
ERROR: No response downloaded for: https://www.netflix.com/
In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py
with ROBOTSTXT_OBEY
ROBOTSTXT_OBEY = False
Here are the release notes
First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With