Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I tell how many people are still using older versions of R?

Tags:

version

r

I'm evaluating whether it's worth retaining support for old versions of R in packages that I maintain, which adds a maintenance overhead. As such, I'd like to estimate how many people are still using R3.5.

This sort of data is easy to find for web browsers; and I know that RStudio compile download statistics for packages. But is there a comparable source of data for who is using (and thus presumably updating packages in) older versions of R?

like image 726
Martin Smith Avatar asked Oct 14 '25 09:10

Martin Smith


1 Answers

It seems raw data is available from http://cran-logs.rstudio.com/ but the data is not grouped by R version. Here we can download the results for a particular day and see how many requests for packages came from different R versions.

options(timeout = max(300, getOption("timeout")))

day <- "2023-04-03"
year <- as.POSIXlt(day)$year + 1900
gzfile <- paste0(day, '.csv.gz')
fileurl <- paste0('http://cran-logs.rstudio.com/', year, '/', gzfile)
download.file(fileurl, gzfile)

dd <- readr::read_csv(gzfile)

library(dplyr)
library(ggplot2)
dd %>% 
  filter(!is.na(r_version) & r_version != "vosonSML") %>% 
  count(r_version) %>% 
  ggplot() +
  aes(r_version, n) + 
  geom_col() +
  coord_flip()

R versions accessing CRAN on 4/3/2023

This site only tracks requests that go to the RStudio CRAN mirror (which is the default so it's probably most requests) but it does ignore other CRAN mirrors. This summary also treats each requests as independent but it's likely that the same computer was installing more than on package on a given day due to package dependencies and such.

It's clear that most people are at least running 4.0 but there is a long tail of versions. For a more representative sample, you will probably want to sample across different dates.

like image 118
MrFlick Avatar answered Oct 17 '25 02:10

MrFlick