Several months ago I asked something similar, but I was using JavaScript to check if provided string is a "valid" R object name. Now I'd like to achieve the same by using nothing but R. I suppose that there's a very nice way to do this, with some neat (not so) esoteric R function, so regular expressions seem to me as the last line of defence. Any ideas?
Oh, yeah, using back-ticks and stuff is considered cheating. =)
character() Function in R Language is used to check if the object is of the form of a string/character or not. It will return true if any element of the object is of the character data type.
Get or Set names of Elements of an Object in R Programming – names() Function. names() function in R Language is used to get or set the name of an Object. This function takes object i.e. vector, matrix or data frame as argument along with the value that is to be assigned as name to the object.
exists() function in R Programming Language is used to check if an object with the names specified in the argument of the function is defined or not. It returns TRUE if the object is found.
In R, there's no fundamental distinction between a string and a character. A "string" is just a character variable that contains one or more characters. One thing you should be aware of, however, is the distinction between a scalar character variable, and a vector.
Edited 2013-1-9 to fix regular expression. Previous regular expression, lifted from page 456 of John Chambers' "Software for Data Analysis", was (subtly) incomplete. (h.t. Hadley Wickham)
There are a couple of issues here. A simple regular expression can be used to identify all syntactically valid names --- but some of those names (like if
and while
) are 'reserved', and cannot be assigned to.
Identifying syntactically valid names:
?make.names
explains that a syntactically valid name:
[...] consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as '".2way"' are not valid [...]
Here is the corresponding regular expression:
"^((([[:alpha:]]|[.][._[:alpha:]])[._[:alnum:]]*)|[.])$"
Identifying unreserved syntactically valid names
To identify unreserved names, you can take advantage of the base function make.names()
, which constructs syntactically valid names from arbitrary character strings.
isValidAndUnreserved <- function(string) {
make.names(string) == string
}
isValidAndUnreserved(".jjj")
# [1] TRUE
isValidAndUnreserved(" jjj")
# [1] FALSE
Putting it all together
isValidName <- function(string) {
grepl("^((([[:alpha:]]|[.][._[:alpha:]])[._[:alnum:]]*)|[.])$", string)
}
isValidAndUnreservedName <- function(string) {
make.names(string) == string
}
testValidity <- function(string) {
valid <- isValidName(string)
unreserved <- isValidAndUnreservedName(string)
reserved <- (valid & ! unreserved)
list("Valid"=valid,
"Unreserved"=unreserved,
"Reserved"=reserved)
}
testNames <- c("mean", ".j_j", ".", "...", "if", "while", "TRUE", "NULL",
"_jj", " j", ".2way")
t(sapply(testNames, testValidity))
Valid Unreserved Reserved
mean TRUE TRUE FALSE
.j_j TRUE TRUE FALSE
. TRUE TRUE FALSE
... TRUE TRUE FALSE
if TRUE FALSE TRUE
while TRUE FALSE TRUE
TRUE TRUE FALSE TRUE
NULL TRUE FALSE TRUE
_jj FALSE FALSE FALSE
j FALSE FALSE FALSE # Note: these tests are for " j", not "j"
.2way FALSE FALSE FALSE
For more discussion of these issues, see the r-devel thread linked to by @Hadley in the comments below.
As Josh suggests, make.names
is probably the best solution to this. Not only will it handle weird punctuation, it'll also flag reserved words:
make.names(".x") # ".x"
make.names("_x") # "X_x"
make.names("if") # " if."
make.names("function") # "function."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With