Getting blocked when scraping Amazon (even with headers, proxies, delay) [closed]

Question

I have a Python code to scrape Amazon product listing. I have set the proxies and headers. I also have sleep() before each crawl. However, I still cannot get the data. The msg I get back is:

To discuss automated access to Amazon data please contact api-services-support@amazon.com

Portions of my code are:

url = "https://www.amazon.com/Baby-Girls-Shoes/b/ref=sv_sl_fl_7239798011?ie=UTF8&node=7239798011"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'}
proxies_list = ["128.199.109.241:8080","113.53.230.195:3128","125.141.200.53:80","125.141.200.14:80","128.199.200.112:138","149.56.123.99:3128","128.199.200.112:80","125.141.200.39:80","134.213.29.202:4444"]
proxies = {'https': random.choice(proxies_list)}
time.sleep(0.5 * random.random())
r = requests.get(url, headers, proxies=proxies)
page_html = r.content
print page_html

This question is not a duplicate of others available on Stackoverflow, because the others suggest using proxies, headers and delay(sleep), and I have already done all of that that. I am unable to scrape even after doing what they suggest.

The code was working initially, but stopped working after scraping a few pages.

Tapa Dipti Sitaula · Accepted Answer

Instead of:

r = requests.get(url, headers, proxies=proxies)

Do:

r = requests.get(url, headers=headers, proxies=proxies)

This resolved the issue for me for now. Hopefully, the resolution will keep working.

thelastone · Answer

From what you describe, Amazon is likely doing something extra (for example with your cookies) to check whether you are using a browser. It's not that you can't get over it though: what I'd do to see the difference between a request from your browser and a request from your script is to inspect the browser and copy as curl one request to amazon. Then transform the curl command to python requests code with this tool. There you have a request that looks exactly like the one on your browser. Do this a couple of times to understand if/how amazon is modifying your cookies on each request, and then try to mimmic this behavior with your script.

If you are sure that the requests look exactly the same, you probably need to increase the waiting time between two consecutive requests. I hope this helps.

fat fantasma · Answer

Try using sessions in Requests. It will remember cookies and headers. If that fails I would try using selenium 2 with either the chrome driver or phantomjs driver if you prefer headless.

Getting blocked when scraping Amazon (even with headers, proxies, delay) [closed]

Tags:

python

python-2.7

web-scraping

Tapa Dipti Sitaula

3 Answers

Tapa Dipti Sitaula

thelastone

fat fantasma

Recent Activity

Donate For Us

Getting blocked when scraping Amazon (even with headers, proxies, delay) [closed]

Tags:

python

python-2.7

web-scraping

Tapa Dipti Sitaula

3 Answers

Tapa Dipti Sitaula

thelastone

fat fantasma

Related questions

Recent Activity

Donate For Us