Is there any package for R that allows querying Wikipedia (most probably using Mediawiki API) to get list of available articles relevant to such query, as well as import selected articles for text mining?
The official mobile version of English Wikipedia is located at https://en.m.wikipedia.org, with the official desktop version of English Wikipedia located at en.wikipedia.org. A toggle switch between mobile and desktop view is available at the bottom of every page.
Wikipedia and other Wikimedia projects are free, collaborative repositories of knowledge, written and maintained by volunteers from around the world. The Wikimedia API gives you open access to add this free knowledge to your projects and apps.
Browse to the page you want to download. Make sure you have Desktop view selected. Mobile devices which default to the Mobile view do not display the required options; to switch to Desktop view, scroll to the bottom of the page and select Desktop . In the left sidebar, under Print/export select Download as PDF .
There is WikipediR
, 'A MediaWiki API wrapper in R'
library(devtools)
install_github("Ironholds/WikipediR")
library(WikipediR)
It includes these functions:
ls("package:WikipediR")
[1] "wiki_catpages" "wiki_con" "wiki_diff" "wiki_page"
[5] "wiki_pagecats" "wiki_recentchanges" "wiki_revision" "wiki_timestamp"
[9] "wiki_usercontribs" "wiki_userinfo"
Here it is in use, getting the contribution details and user details for a bunch of users:
library(RCurl)
library(XML)
# scrape page to get usernames of users with highest numbers of edits
top_editors_page <- "http://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_number_of_edits"
top_editors_table <- readHTMLTable(top_editors_page)
very_top_editors <- as.character(top_editors_table[[3]][1:5,]$User)
# setup connection to wikimedia project
con <- wiki_con("en", project = c("wikipedia"))
# connect to API and get last 50 edits per user
user_data <- lapply(very_top_editors, function(i) wiki_usercontribs(con, i) )
# and get information about the users (registration date, gender, editcount, etc)
user_info <- lapply(very_top_editors, function(i) wiki_userinfo(con, i) )
Use the RCurl
package for retreiving info, and the XML
or RJSONIO
packages for parsing the response.
If you are behind a proxy, set your options.
opts <- list(
proxy = "136.233.91.120",
proxyusername = "mydomain\\myusername",
proxypassword = 'whatever',
proxyport = 8080
)
Use the getForm
function to access the API.
search_example <- getForm(
"http://en.wikipedia.org/w/api.php",
action = "opensearch",
search = "Te",
format = "json",
.opts = opts
)
Parse the results.
fromJSON(rawToChar(search_example))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With