Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meaning of dot in lm(y~.) in R [duplicate]

Tags:

r

I am trying to understand the meaning of this statement in R in a code written by somebody else.

mymodel = lm(gene ~ ., data = mydata) 

mydata is as follows:

> mydata
                 gene    cna rs11433683      PC1    PC2
TCGA.BH.A0C0 270.7446 0.1291          0 270.7446 0.1291
TCGA.A2.A3XY  87.9092 0.0128          1  87.9092 0.0128
TCGA.XX.A89A 255.1346 0.1530          1 255.1346 0.1530

I have gone through the R help section to find how . is interpreted. I understand that . is typically not used, but this is what I found

help(formula)

There are two special interpretations of . in a formula. The usual one is in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’: see terms.formula. In the context of update.formula, only, it means ‘what was previously in this part of the formula’

help(terms.formula)

AllowDotAsName: normally . in a formula refers to the remaining variables contained in data. Exceptionally, . can be treated as a name for non-standard uses of formulae.

data: a data frame from which the meaning of the special symbol . can be inferred. It is unused if there is no . in the formula.

However, I am not really sure what the statements mean. Can somebody give me a simple example of what it means in the context of statement and data I mentioned above?

like image 250
alpha_989 Avatar asked Aug 12 '17 23:08

alpha_989


People also ask

What does the dot mean in R?

The dot you see with the is_spam~. command means that there are no explanatory variables. Typically with model formulas, you will see y~x, but if you have no x variable, y~. says to guess at the value of y using no other variables.

Can I use dot in variable name R?

P.S. In comments, Ben Hyde points to Google's R style guide, which recommends that variable names use dots, not underscore or camel case, for variable names (for example, “avg.

What does mean in lm in R?

The lm() function In R, the lm(), or “linear model,” function can be used to create a simple regression model. The lm() function accepts a number of arguments (“Fitting Linear Models,” n.d.).

What does 2 dots mean in R?

two dots is the second time derivative. so a dot is the same as d/dt. 5.


1 Answers

in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’

Exactly what it says there on the box!

So with

 mymodel = lm(gene ~ ., data = mydata) 

you get every variable other than gene that's in mydata on the RHS of the formula:

   cna + rs11433683 + PC1 + PC2

As far as I can see, the quoted phrase is clear and unambiguous (... but you could also see it just from trying a few small examples)

The only thing that might not be obvious is what it does if you didn't supply a data argument (but that's answered in the help of terms.formula that is referred to in your quote).

like image 169
Glen_b Avatar answered Oct 17 '22 07:10

Glen_b