To celebrate the 20,000th question with the r-tag on Stack Overflow, please help me to extract the R release dates from the Wikipedia page. My attempts: <pre class="prettyprint"><code>library(XML) x <- readHTMLTable("http://en.wikipedia.org/wiki/R_(programming_language)") </code></pre> This doesn't work because the table is in fact a list, not an HTML table. <pre class="prettyprint"><code>library(httr) x <- GET("http://en.wikipedia.org/wiki/R_(programming_language)") text <- content(x, "parsed") </code></pre> This extracts the text, but my <code>xpath</code> is rusty, so I couldn't extract the relevant release dates. How can I do this? <hr> PS. The Wikipedia page is the only source I could find, but please feel free to post a solution using canonical source, if there is one.

Edited to include R version 3.0.0 and above Dirk Eddelbuettel provided the canonical link to the .0 releases of R. Here is some code that collates the tables from the three separate URLs, one for each major release, and then plot it: <pre class="prettyprint"><code>library(XML) library(lattice) getRdates <- function(){ url <- paste0("http://cran.r-project.org/src/base/R-", 0:3) x <- lapply(url, function(x)readHTMLTable(x, stringsAsFactors=FALSE)[[1]]) x <- do.call(rbind, x) x <- x[grep("R-(.*)(\\.tar\\.gz|\\.tgz)", x$Name), c(-1, -5)] x$Release <- gsub("(R-.*)\\.(tar\\.gz|tgz)", "\\1", x$Name) x$Date <- as.POSIXct(x[["Last modified"]], format="%d-%b-%Y %H:%M") x$Release <- reorder(x$Release, x$Date) x } x <- getRdates() dotplot(Release~Date, data=x) </code></pre> <img src="https://i.stack.imgur.com/hnJqf.png" alt="enter image description here">

How to scrape the web for the list of R release dates?

Tags:

r

To celebrate the 20,000th question with the r-tag on Stack Overflow, please help me to extract the R release dates from the Wikipedia page.

My attempts:

library(XML) x <- readHTMLTable("http://en.wikipedia.org/wiki/R_(programming_language)")

This doesn't work because the table is in fact a list, not an HTML table.

library(httr) x <- GET("http://en.wikipedia.org/wiki/R_(programming_language)") text <- content(x, "parsed")

This extracts the text, but my xpath is rusty, so I couldn't extract the relevant release dates.

How can I do this?

PS. The Wikipedia page is the only source I could find, but please feel free to post a solution using canonical source, if there is one.

351

asked Nov 26 '12 15:11

Andrie

2 Answers

Why don't you use the file dates on the canonical ftp archive in Vienna?

Edit: Eg

 lynx -dump http://cran.r-project.org/src/base/R-0/ | grep tgz | grep -v http

gets you a table you can parse from R. Gets you file sizes as a benefit. Rinse and repeat for R-1 and R-2 directories.

answered Sep 23 '22 19:09

Dirk Eddelbuettel

Edited to include R version 3.0.0 and above

Dirk Eddelbuettel provided the canonical link to the .0 releases of R.

Here is some code that collates the tables from the three separate URLs, one for each major release, and then plot it:

library(XML) library(lattice)   getRdates <- function(){   url <- paste0("http://cran.r-project.org/src/base/R-", 0:3)   x <- lapply(url, function(x)readHTMLTable(x, stringsAsFactors=FALSE)[[1]])   x <- do.call(rbind, x)   x <- x[grep("R-(.*)(\\.tar\\.gz|\\.tgz)", x$Name), c(-1, -5)]   x$Release <- gsub("(R-.*)\\.(tar\\.gz|tgz)", "\\1", x$Name)   x$Date <- as.POSIXct(x[["Last modified"]], format="%d-%b-%Y %H:%M")   x$Release <- reorder(x$Release, x$Date)   x }  x <- getRdates() dotplot(Release~Date, data=x)

enter image description here

answered Sep 23 '22 19:09

Andrie

Related questions
                            
                                Saving plot to tiff, with high resolution for publication (in R)
                            
                                How do I print a hexadecimal number with leading 0 to have width 2 using sprintf?
                            
                                How to use or/and in dplyr to subset a data.frame
                            
                                Split the title onto multiple lines?
                            
                                Does R have something equivalent to reduce() in Python?
                            
                                Using R and plot.ly - how do I script saving my output as a webpage
                            
                                Create pdf with tooltips in R
                            
                                Add column which contains binned values of an integer column
                            
                                How to change order of boxplots when using ggplot2?
                            
                                Piping stdin to R
                            
                                ggplot2 make missing value in geom_tile not blank
                            
                                R dplyr rolling sum
                            
                                How to find the minimum value of a column in R?
                            
                                Going to Python from R, what's the python equivalent of a data frame?
                            
                                A matrix version of cor.test()
                            
                                Remove columns with zero values from a dataframe
                            
                                Why is as.Date slow on a character vector?
                            
                                Crop for SpatialPolygonsDataFrame
                            
                                Checking for identical columns in a data frame in R
                            
                                Replace accented characters in R with non-accented counterpart (UTF-8 encoding) [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to scrape the web for the list of R release dates?

Tags:

r

Andrie

People also ask

2 Answers

Dirk Eddelbuettel

Andrie

Recent Activity

Donate For Us