Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Back engineering" an R package from compiled binary version

Tags:

r

I work for an org that has a number of internal packages that were created many years ago. These are in the form of package zip archives that were compiled on Windows on R 3.x. Therefore, they can't be installed on R 4.x, and can't be used on Macs or Linux either without being recompiled. So everyone in the entire org is stuck on R 3.6 until this is resolved. I don't have access to the original package source files. They are lost to time....

I want to take these packages, extract the code and data, and update them for modern best practices (roxygen, GitHub repos, testthat etc.). What is the best way of doing this? I have a fair amount of experience with package development. I have already tackled one. I started a new RStudio package project, and going function by function, copying the function code to a new script file, getting and reformatting the help from the help browser as roxygen docs. I've done the same for any internal hidden functions that i could find (via pkg_name::: mostly) , and also the internal datasets. That is all fairly straightforward, but very time consuming. It builds ok, but I haven't yet tested the actual functionality of the code.

I'm currently stuck because there are a couple of standardGeneric method functions for custom S4 class objects. I am completely unfamiliar with these and haven't been able to figure out how to copy them over. Viewing the source code they are wrapped in new() with "standardGeneric" as the first argument (plus a lot more obviously), as opposed to just being a simple function definition for all the other functions. Any help with how to recreate or copy these over would be very welcome.

But maybe I am going about this the wrong way in the first place. I haven't been able to find any helpful suggestions about how to "back engineer" R package source files from a compiled version.

Anyone any ideas?

like image 228
hokeybot Avatar asked Nov 11 '21 15:11

hokeybot


People also ask

What is a binary package in R?

Binary Packages The binary format of an R package is useful because an R user can install a binary package without compiling all of the package's source code. In some cases source packages can take hours to install. Additionally, compiling package binaries requires locating and installing system prerequisites.

Are all R packages written in R?

Many R packages are written in R. Since R is an interpreted language, source code written in R doesn't have to be translated into system-specific machine language before running. However, some R packages have significant portions written in other, compiled languages, usually C/C++ or Fortran.

What happens when an R version is no longer supported?

When an R version is no longer supported, RStudio Package Manager will continue to serve binary packages for that R version in perpetuity, but no longer provide new binary packages after several months. At this time, binary packages are only supported for CRAN, curated CRAN, and CRAN snapshot sources.

What's the difference between R CMD build and binary build?

This file is the result of running R CMD build for that R package. Binary: A binary file specific to an operating system (OS) and architecture, containing compiled source code. Not an executable. The result of R CMD INSTALL. For more information, see Wickham's book, R Packages.


Video Answer


1 Answers

Check out if this works in R 3.6.

Below script can automate least part of your problem by writing all function sources into separate and appropriately named .R files. This code will also take care of hidden functions.

Extracting code

# Use your package name
package_name <- "dplyr" 

# Extract all method names, including hidden
nms <- paste(lsf.str(paste0("package:", package_name), all.names = TRUE))

# Loop through the method names,
# extract head and body, and write them to R files
for (i in 1:length(nms)) {

    # Extract name
    nm <- nms[i]

    # Extract head
    hd_raw <- capture.output(args(nms[i]))
    # Collapse raw output, but drop trailing NULL
    hd <- paste0(hd_raw[-length(hd_raw)], collapse = "\n")

    # Extract body, collapse
    bd <- paste0(capture.output(body(nms[i])), collapse = "\n")
    
    # Write all to file
    write(paste0(hd, bd), file = paste0(nm, ".R"))
}

Extracting help files

To extract a functions's help text a similar way, you can use code from the following SO answers:

  • for plain text: Get the documentation of an R function from the help as a string
  • for .Rd file contents: How to access the help/documentation .rd source files in R?

A starting point could be something like:

library(tools)
package_name <- "dplyr" 
db <- Rd_db(package_name)

# Extract all method names, including hidden
nms <- paste(lsf.str(paste0("package:", package_name), all.names = TRUE))

# Loop through the method names,
# extract Rd contents if they exist in this namespace, 
# and write them to new Rd files
for (i in 1:length(nms)) {
    
    # Extract name
    nm <- nms[i]
    
    rd_raw <- db[names(db) %in% paste0(nm, ".Rd")]
    if (length(rd_raw) > 0) {
        rd <- paste0(capture.output(rd_raw), collapse = "\n")
        # Write all to file
        write(rd, file = paste0(nm, ".Rd"))
    }
    
}
like image 100
Roman Avatar answered Nov 04 '22 02:11

Roman