Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

determine which packages are used

Tags:

r

Is there a quick way to scan an R script and determine which packages are actually used? By this I mean looking at all of the functions called in the script and returning a list of packages that contain these function names? (I know that function names are not exclusive to any one package)

Why not just look at packages called by library() or require()? Right. Well, I have a bad habit of loading packages I often use regardless of whether I actually use them in the script.

I'd like to clean up some scripts that I intend to share with others by removing unused packages.

I resolve to change my ways in 2016. Please help me get started.

Update

Some good ideas in the comments...

# create an R file that uses a few functions

fileConn<-file("test.R")
writeLines(c("df <- data.frame(v1=c(1, 1, 1), v2=c(1, 2, 3))",
             "\n",
             "m <- mean(df$v2)",
             "\n",
             "describe(df)  #psych package"),
           fileConn)
close(fileConn)

# getParseData approach
pkg <- getParseData(parse("test.R"))
pkg <- pkg[pkg$token=="SYMBOL_FUNCTION_CALL",]
pkg <- pkg[!duplicated(pkg$text),]
pkgname <- pkg$text
pkgname
# [1] "data.frame" "c"          "mean"       "describe" 

Update 2

An ugly attempt to implement @nicola's idea:

# load all probable packages first
pkgList <- list(pkgname)
for (i in 1:length(pkgname)) {
  try(print(packageName(environment(get(pkgList[[1]][i])))))
}

It does not like the c() function, but the results seem otherwise correct.

#[1] "base"
#Error in packageName(environment(get(pkgList[[1]][i]))) : 
#  'env' must be an environment
#[1] "base"
#[1] "psych"
like image 405
Eric Green Avatar asked Nov 08 '22 23:11

Eric Green


1 Answers

An answer based on ideas in the question comments. The key functions are getParseData() and packageName().

# create an R file that uses a few functions

fileConn<-file("test.R")
writeLines(c("df <- data.frame(v1=c(1, 1, 1), v2=c(1, 2, 3))",
             "\n",
             "m <- mean(df$v2)",
             "\n",
             "describe(df)  #psych package"),
           fileConn)
close(fileConn)

# getParseData approach
pkg <- getParseData(parse("test.R"))
pkg <- pkg[pkg$token=="SYMBOL_FUNCTION_CALL",]
pkg <- pkg[!duplicated(pkg$text),]
pkgname <- pkg$text
pkgname
# [1] "data.frame" "c"          "mean"       "describe" 

# load all probable packages first
pkgList <- list(pkgname)
for (i in 1:length(pkgname)) {
  try(print(packageName(environment(get(pkgList[[1]][i])))))
}

#[1] "base"
#Error in packageName(environment(get(pkgList[[1]][i]))) : 
#  'env' must be an environment
#[1] "base"
#[1] "psych"

I'll mark this as correct for now, but happy to consider other solutions.

like image 59
Eric Green Avatar answered Nov 15 '22 07:11

Eric Green