Alternatives for wget giving 'ERROR 403: Forbidden'

Tags:

I'm trying to get text from multiple Pubmed papers using wget, but seems NCBI website don't allow this. Any alternatives?

Bernardos-MacBook-Pro:pangenome_papers_pubmed_result bernardo$ wget -i ./url.txt
--2016-05-04 10:49:34--  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560400/
Resolving www.ncbi.nlm.nih.gov... 130.14.29.110, 2607:f220:41e:4290::110
Connecting to www.ncbi.nlm.nih.gov|130.14.29.110|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-04 10:49:34 ERROR 403: Forbidden.

--2016-05-04 10:49:34--  http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547177/
Reusing existing connection to www.ncbi.nlm.nih.gov:80.
HTTP request sent, awaiting response... 403 Forbidden
2016-05-04 10:49:34 ERROR 403: Forbidden.

880

asked May 04 '16 07:05

biotech

1 Answers

Set custom User Agent like this:

wget --user-agent="Mozilla" http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4560400/

answered Sep 21 '22 19:09

Fiil

Related questions
                            
                                How do I scrape only the <body> tag off of a website
                            
                                Login to a website through web-scraping tool in Python
                            
                                Get the parameters of a JavaScript function with Scrapy
                            
                                Scrapy parse javascript
                            
                                Use scrapy to get list of urls, and then scrape content inside those urls
                            
                                Convert HTML Table to Pandas Data Frame in Python
                            
                                Why does this regex take so long to find email addresses in certain files?
                            
                                a (presumably basic) web scraping of http://www.ssa.gov/cgi-bin/popularnames.cgi in urllib
                            
                                Change IP address in ruby
                            
                                Use BeautifulSoup to get a value after a specific tag
                            
                                Do scrapers need to be written for every site they target?
                            
                                Python 3.x - iloc throws error - "single positional indexer is out-of-bounds"
                            
                                how to force scrapy exit when there is an exception
                            
                                Scrapy : How to pass list of arguments through command prompt to spider?
                            
                                Pass extra values along with urls to scrapy spider
                            
                                SyntaxError: invalid syntax : except urllib2.HTTPError, e:
                            
                                How to Bypass Google Recaptcha while scraping with Requests
                            
                                Dumping Source Code into a local file using CasperJS
                            
                                How to submit a form in scrapy?
                            
                                How to use re() to extract data from javascript variable using scrapy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Alternatives for wget giving 'ERROR 403: Forbidden'

Tags:

wget

web-scraping

text-mining

biotech

People also ask

1 Answers

Fiil

Recent Activity

Donate For Us