Automatic download from sciencedirect

Question

I'm trying to automatically download articles from science direct for example:

url = 'http://www.sciencedirect.com/science/article/pii/S1053811913010240'

I can access the articles with my browser without problem, but I have tried using Python 's requests, urllib2 and mechanize modules without success. Since I need to download many articles, doing it manually is not an option.

Wget does not work either.

E.g.

wget http://www.sciencedirect.com/science/article/pii/S1053811913010240

returns:

HTTP request sent, awaiting response... 404 Not Found

any ideas what the problem may be?

nofinator · Accepted Answer

They may not be working because the web server doesn't like the User Agent. Perhaps it is trying to block batch downloading.

If you specify a User Agent with wget, it works. To use your example.

wget -U "Mozilla/5.0" "https://www.sciencedirect.com/science/article/pii/S1053811913010240"

Automatic download from sciencedirect

Tags:

python

python-requests

wget

web-scraping

user2894079

1 Answers

nofinator

Recent Activity

Donate For Us

Automatic download from sciencedirect

Tags:

python

python-requests

wget

web-scraping

user2894079

1 Answers

nofinator

Related questions

Recent Activity

Donate For Us