Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downloading new data from internet when package is loaded every time

Tags:

r

r-package

I have a package that scrapes data from the internet and displays its content based on the function call. But recently I got a message from CRAN that the data becomes stale when Binary build is installed (since the function was mentioned in utils.R and it has downloaded while the build).

For the past few days, I've tried the following but no success:

  • Global Variable using <<- but it generates a CRAN note and I also went through a few answers which advised against the approach Note: no visible binding for global variable
  • Create a new environment and then add this downloaded object in that, but it never worked out since I couldn't access the object in other functions. Ref: Where to create package environment variables?

This is the current package files: https://github.com/amrrs/tiobeindexr/tree/master/R

Tried solution:

zzz.r file:

.onLoad <- function (libname, pkgname)
{

  assign("newEnv", new.env(hash = TRUE, parent = parent.frame()))

  newEnv$.all_tablesx789  <- rvest::html_table(xml2::read_html('https://www.tiobe.com/tiobe-index/'))


}

one of the functions in the core code.

hall_of_fame <- function() {

  #check_data()

  #.GlobalEnv$.all_tablesx789 <- check_data()

  newEnv$.all_tablesx789[[4]]

}

The package builds fine, but the object is not found. Error below:

Error in hall_of_fame() : object 'newEnv' not found

I've only a couple of days to save my package on CRAN and I hope I've provided enough data from saving this question being downloaded.

Thanks!

like image 879
amrrs Avatar asked Sep 02 '18 13:09

amrrs


2 Answers

Consider adding memoise as a dependency so you can get in-session caching for free with a minimal dependency chain then using a package environment and (just for fun) an active binding.

Create new 📦 env (you can stick this in, say, aaa.R):

.pkgenv <- new.env(parent=emptyenv())

Now, (say, in zzz.R) setup one function that does the table grabbing:

.get_tiboe_tables <- function(url) {
  message("Delete this since it's just to show caching works") # delete this
  content <- xml2::read_html(url)
  rvest::html_table(content)
}

And "memoise" it (again, in zzz.R):

get_tiboe_tables <- memoise::memoise(.get_tiboe_tables)

Now, create an active binding which will let us access the tables like a variable (i.e. w/o the ()). It's more "fun" than necessary (again, in zzz.R):

makeActiveBinding(
  sym = "all_tables",
  fun = function() get_tiboe_tables('https://www.tiobe.com/tiobe-index/'),
  env = .pkgenv
)

Now, get the value like this (notice we get the "loading" message as it "primes" the cache:

str(.pkgenv$all_tables, 1)
## Delete this since it's just to show caching works ** the loading msg
## List of 4
##  $ :'data.frame':    20 obs. of  6 variables:
##  $ :'data.frame':    30 obs. of  3 variables:
##  $ :'data.frame':    15 obs. of  8 variables:
##  $ :'data.frame':    15 obs. of  2 variables:

On subsequent calls there is no loading message since it's retrieving the cached value:

str(.pkgenv$all_tables, 1)
## List of 4
##  $ :'data.frame':    20 obs. of  6 variables:
##  $ :'data.frame':    30 obs. of  3 variables:
##  $ :'data.frame':    15 obs. of  8 variables:
##  $ :'data.frame':    15 obs. of  2 variables:

On the next R session it will refresh the tables. That way, there's fresh data without abusing the site. You can use file collation instead of sorted-name hacking as well.

Note that you can export the active binding as well and your 📦 users can then use it like a variable instead of calling it like a function.

like image 166
hrbrmstr Avatar answered Nov 12 '22 22:11

hrbrmstr


Actually, I took a slightly different approach from the above answer. This is in reference with Thomas' comment and the reason is I didn't want to add memoise as a dependency and tried an alternative.

Creating a new package in aaa.R:

.pkgenv <- new.env(parent=emptyenv())

Loading data into the tables within the environment using .onAttach() in zzz.R

.onAttach <- function(libname, pkgname) {

  packageStartupMessage("Downloading TIOBE Index Data using your Internet...")

  tryCatch({
    .pkgenv$.get_tiboe_tables <- rvest::html_table(xml2::read_html("https://www.tiobe.com/tiobe-index/"))
  },
  error = function(e){
    packageStartupMessage("Downloading TIOBE Index data failed!")
    packageStartupMessage("Error Message:")
    packageStartupMessage(e)
    return(NA)
  })

}

My earlier mistakes seems that I was trying to create the new enviroment inside .onLoad() itself.

like image 43
amrrs Avatar answered Nov 12 '22 21:11

amrrs