I am struggling with writing a script that would somehow scrape the https://www.rstudio.com/products/rstudio/download/
for the number of the latest RStudio version, download it and install it.
Since I am an R programmer, I started to write an R script using rvest
package. I managed to scrape the download link for the RStudio server, but I still cannot get the RStudio itself.
Here is the R code for getting a download link for the 64bit RStudio-server for Ubuntu.
if(!require('stringr')) install.packages('stringr', Ncpus=8, repos='http://cran.us.r-project.org')
if(!require('rvest')) install.packages('rvest', Ncpus=8, repos='http://cran.us.r-project.org')
xpath<-'//code[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]'
url<-'https://www.rstudio.com/products/rstudio/download-server/'
thepage<-xml2::read_html(url)
the_links_html <- rvest::html_nodes(thepage,xpath=xpath)
the_links <- rvest::html_text(the_links_html)
the_link <- the_links[stringr::str_detect(the_links, '-amd64\\\\.deb')]
the_r_uri<-stringr::str_match(the_link, 'https://.*$')
cat(the_r_uri)
Unfortunately, the RStudio desktop download page has completely different layout, and I the same approach doesn't work here.
Can someone help me with this? I can't believe, that all the data scientist in the world manually upgrade their RStudio!
There is an even simpler version of the script, that reads the version of the RStudio-server. Bash version:
RSTUDIO_LATEST=$(wget --no-check-certificate -qO- https://s3.amazonaws.com/rstudio-server/current.ver)
or R version:
scan('https://s3.amazonaws.com/rstudio-server/current.ver', what = character(0))
But the version of the RStudio-desktop still eludes me.
It seems that you can get the latest stable version number from the url http://download1.rstudio.org/current.ver and it is more up to date (for some unknown reason), at least at the time of writing this answer.
$ curl -s http://download1.rstudio.org/current.ver
1.1.447
$ curl -s https://www.rstudio.org/links/check_for_update?version=1.0.0 | grep -oEi 'update-version=([0-9]+\.[0-9]+\.[0-9]+)' | awk -F= '{print $2}'
1.1.423
Found that here: https://github.com/yutannihilation/ansible-playbook-r/blob/master/tasks/install-rstudio-server.yml
If you query RStudio's check_for_update
with a version string you'll get back the update version and the URL of where to get it from:
https://www.rstudio.org/links/check_for_update?version=1.0.0
update-version=1.0.153&update-url=https%3A%2F%2Fwww.rstudio.com%2Fproducts%2Frstudio%2Fdownload%2F&update-message=RStudio%201.0.153%20is%20now%20available%20%28you%27re%20using%201.0.0%29&update-urgent=0
See here:
https://github.com/rstudio/rstudio/blob/54cd3abcfc58837b433464c793fe9b03a87f0bb4/src/cpp/session/modules/SessionUpdates.R
If you really want to scrape it from the download page then I'd get the href
of the <a>
in the first <td>
of the first <table>
of class "downloads", and then parse out the three dot-separated numbers between "RStudio-" and ".exe". RStudio release versions over all platforms so getting it from the Windows download should be sufficient.
> url = "https://www.rstudio.com/products/rstudio/download/"
> thepage<-xml2::read_html(url)
> html_node(thepage, ".downloads td a") %>% html_attr("href")
[1] "https://download1.rstudio.org/RStudio-1.0.153.exe"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With