I am trying to access a Wikipedia page so to get a list of pages, and get the following error:
library(RCurl)
u <- "http://en.wikipedia.org/w/index.php?title=Special%3APrefixIndex&prefix=tal&namespace=4"
getURL(u)
[1] "Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.\n"
I hope to get to that page through the Wikipedia api, but I am not sure it would work.
And the thing is that other pages are read without problem, for example:
u <- "http://en.wikipedia.org/wiki/Wikipedia:Talk"
getURL(u)
Any suggestions?
Side note: In general I would rather to not scrape wiki pages and go through the api, but I fear that this specific pages are not yet available through the api...
According to the documentation of RCurl
, you can specify additional header by adding a httpheader
parameter:
getURL(u, httpheader = c('User-Agent' = "Informative string with your contact info"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With