Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding objects from other packages' namespaces in package code

I'm refactoring a package that imports many other packages' full namespaces. I believe that many of these dependencies are used for single function call uses that would be better handled using importFrom, or are orphaned dependencies that are no longer used.

There's enough code in the package that it would be tedious to manually examine every line looking for unfamiliar function calls.

How can I determine where and how many times objects from imported namespaces are being used in the package? Please note that this package does not include unit tests.

Here is a reproducible example:

DESCRIPTION file:

Package: my_package
Title: title
Version: 0.0.1
Authors@R: person(
  given = "A",
  family = "Person",
  role = c("aut", "cre"),
  email = "[email protected]"
)
Description: Something
License: Some license
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Imports: 
  dplyr,
  purrr,
  stringr

NAMESPACE file:

import(dplyr)
import(purrr)
import(stringr)

my_package.R file:

#' my_package
#' @docType package
#' @name my_package
NULL
#' @import dplyr
#' @import purrr
#' @import stringr
NULL

functions.R file

#' add 1 to "banana" column and call it "apple"
#' @description demonstrate a variety of dplyr functions
#' @param x a data.frame object
#' @return a data.frame object with columns "apple" and "banana"
#' @examples
#' my_fruit <- data.frame(banana = c(1,2,3), pear = c(4,5,6))
#' my_function(my_fruit)
#' @export
my_function <- function(x) {
  x %>%
    mutate(apple = banana + 1) %>%
    select(apple, banana)
}

I am looking for a solution that would identifies that %>%, mutate and select are exports from dplyr, %>% is an export from purrr, and there are no used exports from the attached namespace stringr. In the case of functions like %>% exported from multiple namespaces it's not that important to me to distinguish which namespace the export is coming from (in the example both %>% are rexports from the magrittr dependency) since where actual masking occurs a warning is generated when the package gets loaded.

like image 486
bcarlsen Avatar asked May 11 '21 17:05

bcarlsen


Video Answer


2 Answers

Here's a base solution

pkgs <- readLines("NAMESPACE")
pattern <- "^import\\((.*?)\\)$"
pkgs <- pkgs[grepl(pattern, pkgs)]
pkgs <- sub(pattern, "\\1", pkgs)
pkgs
#> [1] "dplyr"   "purrr"   "stringr"

exports <- sapply(pkgs, getNamespaceExports)
exports <- do.call(rbind, Map(data.frame, package = pkgs, fun = exports))
rownames(exports) <- NULL
head(exports)
#>   package         fun
#> 1   dplyr rows_upsert
#> 2   dplyr   src_local
#> 3   dplyr  db_analyze
#> 4   dplyr    n_groups
#> 5   dplyr    distinct
#> 6   dplyr  summarise_

code <- sapply(list.files("R", full.names = TRUE), parse)
funs <- sapply(code, function(x) setdiff(all.names(x), all.vars(x)))
funs <- funs[lengths(funs) > 0]
funs <- do.call(rbind, Map(data.frame, fun = funs, file = names(funs)))
rownames(funs) <- NULL
funs
#>        fun          file
#> 1       <- R/functions.R
#> 2 function R/functions.R
#> 3        { R/functions.R
#> 4      %>% R/functions.R
#> 5   mutate R/functions.R
#> 6        + R/functions.R
#> 7   select R/functions.R

final output :

merge(exports, funs)
#>      fun package          file
#> 1    %>% stringr R/functions.R
#> 2    %>%   purrr R/functions.R
#> 3    %>%   dplyr R/functions.R
#> 4 mutate   dplyr R/functions.R
#> 5 select   dplyr R/functions.R

It is not 100% robust as for instance a function function(x) {select<-identity; select(x)} will show select as being taken from {dplyr}.

It will also miss functions that are not used in fun() form, as in lapply(my_list, fun).

We can't really detect those robustly, a way around, that might get us there or at least closer if we have 100% test coverage, is to curry those imported functions so they tell us when they're called, then run the tests.

You probably don't need this though.

like image 149
Moody_Mudskipper Avatar answered Oct 23 '22 20:10

Moody_Mudskipper


You could use getParsedData to get all function calls used in the package, and join them with available functions in NAMESPACE to find out their origin.

Tested on reproducible example my_package:

library(dplyr)
library(purrr)
library(stringr)

# List functions used in Package
path <- "./my_package"
files <- file.path(path,list.files(path= path, recursive = TRUE, pattern ='\\.R$'))

functions <- files %>% map_dfr(~{
  getParseData(parse(.x, keep.source=TRUE)) %>% 
          filter(token %in% c("SYMBOL_FUNCTION_CALL","SPECIAL")) %>%
          mutate(file = .x) %>%
          rename(fctname = text) %>%
          select(file, fctname) %>% unique })

# List of all possible functions imports
imports <- readLines(file.path(path,"NAMESPACE"))
imports <- str_match(imports, "import\\(\\s*(.*?)\\s*\\)")[,2]
imports <- imports[!is.na(imports)]

possible.imported.functions <- imports %>% map_dfr(~{
  data.frame(package.import = .x,fctname = getNamespaceExports(.x)) })

# Imported functions in use
inner_join(functions,possible.imported.functions, by = c('fctname'='fctname')) %>%
  arrange(package.import,fctname) %>%
  select(file,package.import,fctname)
#>                             file package.import fctname
#> 1 my_package/R/functions.R          dplyr     %>%
#> 2 my_package/R/functions.R          dplyr  mutate
#> 3 my_package/R/functions.R          dplyr  select
#> 4 my_package/R/functions.R          purrr     %>%
#> 5 my_package/R/functions.R        stringr     %>%

like image 24
Waldi Avatar answered Oct 23 '22 22:10

Waldi