Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does R ignore variable name extensions starting with a dot in a data frame?

Tags:

dataframe

r

names

I have a data frame where some variable names include a "." extension. It seems R will ignore this extension and give me the variable anyway if I try to access it without the complete variable name. What is causing this/why does it happen? Below is a mini example of my problem.

y <- rnorm(100)
x <- rlnorm(100)

data <- cbind.data.frame(y,x)

colnames(data) <- c("y.rnorm","x.rlnorm")

# these both return the same thing
data$y
data$y.rnorm
like image 534
user29609 Avatar asked Dec 23 '22 12:12

user29609


2 Answers

R is setup to provide results to partial matches by design.

Read section 3.4 & 4.3 of the R language definition.

3.4.1 Character. The strings in i are matched against the names attribute of x and the resulting integers are used. For [[ and $ partial matching is used if exact matching fails, so x$aa will match x$aabb if x does not contain a component named "aa" and "aabb" is the only name which has prefix "aa". For [[, partial matching can be controlled via the exact argument which defaults to NA indicating that partial matching is allowed, but should result in a warning when it occurs. Setting exact to TRUE prevents partial matching from occurring, a FALSE value allows it and does not issue any warnings. Note that [ always requires an exact match. The string "" is treated specially: it indicates ‘no name’ and matches no element (not even those without a name). Note that partial matching is only used when extracting and not when replacing.

and

4.3.2 Partial matching on tags. Each remaining named supplied argument is compared to the remaining formal arguments using partial matching. If the name of the supplied argument matches exactly with the first part of a formal argument then the two arguments are considered to be matched. It is an error to have multiple partial matches. Notice that if f <- function(fumble, fooey) fbody, then f(f = 1, fo = 2) is illegal, even though the 2nd actual argument only matches fooey. f(f = 1, fooey = 2) is legal though since the second argument matches exactly and is removed from consideration for partial matching. If the formal arguments contain ‘...’ then partial matching is only applied to arguments that precede it.

update

As noted by Uwe, there may be a pending update to the R language definition as the behaviour of [[ partial matching has changed. A look through R News shows the following as deprecated and defunct with the 3.1.0 release:

Partial matching when using the $ operator on data frames now throws a warning and may become defunct in the future. If partial matching is intended, replace foo$bar by foo[["bar", exact = FALSE]]

like image 186
Kevin Arseneau Avatar answered Jan 30 '23 23:01

Kevin Arseneau


The $ operator is designed to do partial matching. See the Subsetting chapter of Advanced R by Hadley Wickham, Ctrl + F "partial matching":

There’s one important difference between $ and [[. $ does partial matching:

x <- list(abc = 1)

x$a

## [1] 1

x[["a"]]

## NULL

If you want to avoid this behaviour you can set the global option warnPartialMatchDollar to TRUE. Use with caution: it may affect behaviour in other code you have loaded (e.g., from a package).

like image 44
ardaar Avatar answered Jan 30 '23 23:01

ardaar