Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What 1-2 letter object names conflict with existing R objects?

To make my code more readable, I like to avoid names of objects that already exist when creating new objects. Because of the package-based nature of R, and because functions are first-class objects, it can be easy to overwrite common functions that are not in base R (since a common package might use a short function name but without knowing what package to load there is no way to check for it). Objects such as the built-in logicals T and F also cause trouble.

Some examples that come to mind are:

One letter

  • c
  • t
  • T/F
  • J

Two letters

  • df

A better solution might be to avoid using short names altogether in favor of more descriptive ones, and I generally try to do that as a matter of habit. Yet "df" for a function which manipulates a generic data.frame is plenty descriptive and a longer name adds little, so short names have their uses. In addition, for SO questions where the larger context isn't necessarily known, coming up with descriptive names is well-nigh impossible.

What other one- and two-letter variable names conflict with existing R objects? Which among those are sufficiently common that they should be avoided? If they are not in base, please list the package as well. The best answers will involve at least some code; please provide it if used.

Note that I am not asking whether or not overwriting functions that already exist is advisable or not. That question is addressed on SO already:

In R, what exactly is the problem with having variables with the same name as base R functions?

For visualizations of some answers here, see this question on CV:

https://stats.stackexchange.com/questions/13999/visualizing-2-letter-combinations

like image 508
Ari B. Friedman Avatar asked Aug 08 '11 08:08

Ari B. Friedman


2 Answers

apropos is ideal for this:

apropos("^[[:alpha:]]{1,2}$")

With no packages loaded, this returns:

 [1] "ar" "as" "by" "c"  "C"  "cm" "D"  "de" "df" "dt" "el" "F"  "gc" "gl"
[15] "I"  "if" "Im" "is" "lh" "lm" "ls" "pf" "pi" "pt" "q"  "qf" "qr" "qt"
[29] "Re" "rf" "rm" "rt" "sd" "t"  "T"  "ts" "vi"

The exact contents will depend upon the search list. Try loading a few packages and re-running it if you care about conflicts with packages that you commonly use.


I loaded all the (>200) packages installed on my machine with this:

lapply(rownames(installed.packages()), require, character.only = TRUE)

And reran the call to apropos, wrapping it in unique, since there were a few duplicates.

one_or_two <- unique(apropos("^[[:alpha:]]{1,2}$"))

This returned:

  [1] "Ad" "am" "ar" "as" "bc" "bd" "bp" "br" "BR" "bs" "by" "c"  "C" 
 [14] "cc" "cd" "ch" "ci" "CJ" "ck" "Cl" "cm" "cn" "cq" "cs" "Cs" "cv"
 [27] "d"  "D"  "dc" "dd" "de" "df" "dg" "dn" "do" "ds" "dt" "e"  "E" 
 [40] "el" "ES" "F"  "FF" "fn" "gc" "gl" "go" "H"  "Hi" "hm" "I"  "ic"
 [53] "id" "ID" "if" "IJ" "Im" "In" "ip" "is" "J"  "lh" "ll" "lm" "lo"
 [66] "Lo" "ls" "lu" "m"  "MH" "mn" "ms" "N"  "nc" "nd" "nn" "ns" "on"
 [79] "Op" "P"  "pa" "pf" "pi" "Pi" "pm" "pp" "ps" "pt" "q"  "qf" "qq"
 [92] "qr" "qt" "r"  "Re" "rf" "rk" "rl" "rm" "rt" "s"  "sc" "sd" "SJ"
[105] "sn" "sp" "ss" "t"  "T"  "te" "tr" "ts" "tt" "tz" "ug" "UG" "UN"
[118] "V"  "VA" "Vd" "vi" "Vo" "w"  "W"  "y"

You can see where they came from with

lapply(one_or_two, find)
like image 53
Richie Cotton Avatar answered Nov 08 '22 04:11

Richie Cotton


Been thinking about this more. Here's a list of one-letter object names in base R:

> var.names <- c(letters,LETTERS)
> var.names[sapply(var.names,exists)]
[1] "c" "q" "t" "C" "D" "F" "I" "T" "X"

And one- and two-letter object names in base R:

one.letter.names <- c(letters,LETTERS)

N <- length(one.letter.names)


first <- rep(one.letter.names,N)
second <- rep(one.letter.names,each=N)

two.letter.names <- paste(first,second,sep="")

var.names <- c(one.letter.names,two.letter.names)

> var.names[sapply(var.names,exists)]
[1] "c"  "d"  "q"  "t"  "C"  "D"  "F"  "I"  "J"  "N"  "T"  "X"  "bc" "gc"
[15] "id" "sd" "de" "Re" "df" "if" "pf" "qf" "rf" "lh" "pi" "vi" "el" "gl"
[29] "ll" "cm" "lm" "rm" "Im" "sp" "qq" "ar" "qr" "tr" "as" "bs" "is" "ls"
[43] "ns" "ps" "ts" "dt" "pt" "qt" "rt" "tt" "by" "VA" "UN"

That's a much bigger list than I initially suspected, although I would never think of naming a variable "if", so to a certain degree it makes sense.

Still doesn't capture object names not in base, or give any sense of which functions are best avoided. I think a better answer would either use expert opinion to figure out which functions are important (e.g. using c is probably worse than using qf) or use a data mining approach on a bunch of R code to see what short-named functions get used the most.

like image 34
Ari B. Friedman Avatar answered Nov 08 '22 03:11

Ari B. Friedman