I'm trying to automatically download articles from science direct for example:
url = 'http://www.sciencedirect.com/science/article/pii/S1053811913010240'
I can access the articles with my browser without problem, but I have tried using Python 's requests, urllib2 and mechanize modules without success. Since I need to download many articles, doing it manually is not an option.
Wget does not work either.
E.g.
wget http://www.sciencedirect.com/science/article/pii/S1053811913010240
returns:
HTTP request sent, awaiting response... 404 Not Found
any ideas what the problem may be?
They may not be working because the web server doesn't like the User Agent. Perhaps it is trying to block batch downloading.
If you specify a User Agent with wget, it works. To use your example.
wget -U "Mozilla/5.0" "https://www.sciencedirect.com/science/article/pii/S1053811913010240"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With