I'm trying to get text from multiple Pubmed papers using wget, but seems NCBI website don't allow this. Any alternatives?
Bernardos-MacBook-Pro:pangenome_papers_pubmed_result bernardo$ wget -i ./url.txt
--2016-05-04 10:49:34-- http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560400/
Resolving www.ncbi.nlm.nih.gov... 130.14.29.110, 2607:f220:41e:4290::110
Connecting to www.ncbi.nlm.nih.gov|130.14.29.110|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-04 10:49:34 ERROR 403: Forbidden.
--2016-05-04 10:49:34-- http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547177/
Reusing existing connection to www.ncbi.nlm.nih.gov:80.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-04 10:49:34 ERROR 403: Forbidden.
This status is similar to 401 , but for the 403 Forbidden status code, re-authenticating makes no difference. The access is permanently forbidden and tied to the application logic, such as insufficient rights to a resource.
An HTTP 403 response code means that a client is forbidden from accessing a valid URL. The server understands the request, but it can't fulfill the request because of client-side issues. The caller isn't authorized to access an API that's using an API Gateway Lambda authorizer.
Refreshing the page is always worth a shot. Many times the 403 error is temporary, and a simple refresh might do the trick. Most browsers use Ctrl+R on Windows or Cmd+R on Mac to refresh, and also provide a Refresh button somewhere on the address bar.
Set custom User Agent like this:
wget --user-agent="Mozilla" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560400/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With