Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Function naming for R packages

Tags:

I am writing an R package and would really like to avoid using function names found in other packages. For example, I planned to call a function 'annotate', but this has already been used in the NLP package. Evidently it is best to avoid obvious name choices, but is there a systematic way to search an exhaustive list of CRAN published function names to avoid duplication? I appreciate this is primarily important for CRAN shared packages but it can also be relevant when sharing locally just in case there is a conflict with another loaded package.

like image 220
Heather Robinson Avatar asked Aug 22 '17 15:08

Heather Robinson


1 Answers

Name clashes occur when two packages are loaded which contain functions with the same name. So, name clashes can be avoided at two places:

  • when defining function names in a package
  • when calling functions from a package

Creating functions with unique names

At the time of writing (23 Aug 2017), the incredible number of 11272 packages were available on CRAN (The latest figure can be found here) and new packages are being added every day.

So, creating function names which are unique today may cause name clashes in the future when other packages will be added.

Alistaire already has mentioned the option to prefix all your functions. Besides stringi and stringr, the forcats packages is another example which uses the prefixes fct_ and lvls_.

This approach may reduce greatly the probability of name clashes.

(Although it's not guaranteed that no other package maintainer might choose the same prefix.)

Calling functions unambiguously using the double colon operator

IMHO, the ultimate responsibility for avoiding name clashes is the user's.

I've seen questions here on SO with more than half a dozen of packages being loaded. Or, library(tidyverse) is called for convenience, which is loading 19 other packages where dplyrand tidyr would have been sufficient.

Cluttering the namespace with many loaded packages increases the risk of name clashes. And even with only two packages loaded, name clashes might occur. For instance, the lubridate and the data.table package both have defined

hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year

Which function is being called will depend on the order the packages have been loaded. (You may use conflicts() to find objects that exist with the same name in two or more places on the search path.)

To avoid ambiguities and unexpected results, I suggest to load as few packages as possible and to use the double colon operator ?"::" to call functions from packages without loading the package beforehand, e.g.,

library(data.table)
DT <- data.table(t = lubridate::now() + 0:3)
# call function from loaded package data.table
DT[, second(t)] 
[1] 18 19 20 21
# call function from lubridate package
DT[, lubridate::second(t)]
[1] 18.88337 19.88337 20.88337 21.88337

There is another benefit from using the double colon operator. It will serve as a documentation within the code from which package a function is being called.

This comes at the expense of a few additional key strokes but may save a lot of time when a code is inspected, amended, or debugged weeks or years later. I've seen many questions on SO where the OP hasn't mentioned the package.

like image 165
Uwe Avatar answered Oct 11 '22 14:10

Uwe