To make my code more readable, I like to avoid names of objects that already exist when creating new objects. Because of the package-based nature of R, and because functions are first-class objects, it can be easy to overwrite common functions that are not in base R (since a common package might use a short function name but without knowing what package to load there is no way to check for it). Objects such as the built-in logicals T and F also cause trouble.
Some examples that come to mind are:
One letter
Two letters
A better solution might be to avoid using short names altogether in favor of more descriptive ones, and I generally try to do that as a matter of habit. Yet "df" for a function which manipulates a generic data.frame is plenty descriptive and a longer name adds little, so short names have their uses. In addition, for SO questions where the larger context isn't necessarily known, coming up with descriptive names is well-nigh impossible.
What other one- and two-letter variable names conflict with existing R objects? Which among those are sufficiently common that they should be avoided? If they are not in base
, please list the package as well. The best answers will involve at least some code; please provide it if used.
Note that I am not asking whether or not overwriting functions that already exist is advisable or not. That question is addressed on SO already:
In R, what exactly is the problem with having variables with the same name as base R functions?
For visualizations of some answers here, see this question on CV:
https://stats.stackexchange.com/questions/13999/visualizing-2-letter-combinations
apropos
is ideal for this:
apropos("^[[:alpha:]]{1,2}$")
With no packages loaded, this returns:
[1] "ar" "as" "by" "c" "C" "cm" "D" "de" "df" "dt" "el" "F" "gc" "gl"
[15] "I" "if" "Im" "is" "lh" "lm" "ls" "pf" "pi" "pt" "q" "qf" "qr" "qt"
[29] "Re" "rf" "rm" "rt" "sd" "t" "T" "ts" "vi"
The exact contents will depend upon the search list. Try loading a few packages and re-running it if you care about conflicts with packages that you commonly use.
I loaded all the (>200) packages installed on my machine with this:
lapply(rownames(installed.packages()), require, character.only = TRUE)
And reran the call to apropos
, wrapping it in unique
, since there were a few duplicates.
one_or_two <- unique(apropos("^[[:alpha:]]{1,2}$"))
This returned:
[1] "Ad" "am" "ar" "as" "bc" "bd" "bp" "br" "BR" "bs" "by" "c" "C"
[14] "cc" "cd" "ch" "ci" "CJ" "ck" "Cl" "cm" "cn" "cq" "cs" "Cs" "cv"
[27] "d" "D" "dc" "dd" "de" "df" "dg" "dn" "do" "ds" "dt" "e" "E"
[40] "el" "ES" "F" "FF" "fn" "gc" "gl" "go" "H" "Hi" "hm" "I" "ic"
[53] "id" "ID" "if" "IJ" "Im" "In" "ip" "is" "J" "lh" "ll" "lm" "lo"
[66] "Lo" "ls" "lu" "m" "MH" "mn" "ms" "N" "nc" "nd" "nn" "ns" "on"
[79] "Op" "P" "pa" "pf" "pi" "Pi" "pm" "pp" "ps" "pt" "q" "qf" "qq"
[92] "qr" "qt" "r" "Re" "rf" "rk" "rl" "rm" "rt" "s" "sc" "sd" "SJ"
[105] "sn" "sp" "ss" "t" "T" "te" "tr" "ts" "tt" "tz" "ug" "UG" "UN"
[118] "V" "VA" "Vd" "vi" "Vo" "w" "W" "y"
You can see where they came from with
lapply(one_or_two, find)
Been thinking about this more. Here's a list of one-letter object names in base R:
> var.names <- c(letters,LETTERS)
> var.names[sapply(var.names,exists)]
[1] "c" "q" "t" "C" "D" "F" "I" "T" "X"
And one- and two-letter object names in base R:
one.letter.names <- c(letters,LETTERS)
N <- length(one.letter.names)
first <- rep(one.letter.names,N)
second <- rep(one.letter.names,each=N)
two.letter.names <- paste(first,second,sep="")
var.names <- c(one.letter.names,two.letter.names)
> var.names[sapply(var.names,exists)]
[1] "c" "d" "q" "t" "C" "D" "F" "I" "J" "N" "T" "X" "bc" "gc"
[15] "id" "sd" "de" "Re" "df" "if" "pf" "qf" "rf" "lh" "pi" "vi" "el" "gl"
[29] "ll" "cm" "lm" "rm" "Im" "sp" "qq" "ar" "qr" "tr" "as" "bs" "is" "ls"
[43] "ns" "ps" "ts" "dt" "pt" "qt" "rt" "tt" "by" "VA" "UN"
That's a much bigger list than I initially suspected, although I would never think of naming a variable "if", so to a certain degree it makes sense.
Still doesn't capture object names not in base, or give any sense of which functions are best avoided. I think a better answer would either use expert opinion to figure out which functions are important (e.g. using c
is probably worse than using qf
) or use a data mining approach on a bunch of R code to see what short-named functions get used the most.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With