Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does R find a data.frame variable that isn't in the data.frame?

Tags:

r

Why does this not cause an error ?

> str(u)
'data.frame':   8879 obs. of  2 variables:
 $ bundle_qty: int  1 1 1 1 1 1 1 1 1 1 ...
 $ mail_a    : num  1 1 1 1 0 0 0 1 1 0 ...

> head(u$mail)
[1] 1 1 1 1 0 0

Variable mail isn't in data.frame u !!! Shouldn't u$mail return NULL ??

Even when I start from scratch with dummy data:

> rm(list=ls())
> u <- data.frame( bundle_qty = c(1,1,1,1), mail_a = c(1,1,1,1))
> str(u)
'data.frame':   4 obs. of  2 variables:
 $ bundle_qty: num  1 1 1 1
 $ mail_a    : num  1 1 1 1
> u <- data.frame( bundle_qty = c(1L,1L,1L,1L), mail_a = c(1,1,1,1))
> str(u)
'data.frame':   4 obs. of  2 variables:
 $ bundle_qty: int  1 1 1 1
 $ mail_a    : num  1 1 1 1
> u$mail
[1] 1 1 1 1
like image 517
user2105469 Avatar asked Mar 05 '13 05:03

user2105469


2 Answers

Partial matching, which the $ operator uses, will return a value if it can uniquely identify a variable given the stem (e.g. - mail) you provide.

E.g. - there is nothing else starting with mail in your data.frame, so you get mail_a returned.

u["mail"] will throw an error though.

To give a further example showing where it works as you thought it would:

test <- data.frame(aa=1:10,aaa=letters[1:10])

> test$aa
 [1]  1  2  3  4  5  6  7  8  9 10
> test$aaa
 [1] a b c d e f g h i j
Levels: a b c d e f g h i j
> test$a
NULL

And fortune(312) that @mnel refers to is:

"The problem here is that the $ notation is a magical shortcut and like any other magic if used incorrectly is likely to do the programmatic equivalent of turning yourself into a toad."

  • Greg Snow (in response to a user that wanted to access a column whose name is stored in y via x$y rather than x[[y]]) R-help (February 2012)
like image 174
thelatemail Avatar answered Oct 04 '22 20:10

thelatemail


u$mail

Is calling the equivalent of

u[['mail', exact = FALSE]]

It will use partial matching to find a named element (column)

u[['mail']]

Will not use partial matching and thus won't find any column.

It is safer to use [[ as noted in fortune(312)

 /\_/\
( o o )
== Y ==
 - -
like image 33
mnel Avatar answered Oct 04 '22 20:10

mnel