I am trying to understand the meaning of this statement in R in a code written by somebody else.
mymodel = lm(gene ~ ., data = mydata)
mydata
is as follows:
> mydata
gene cna rs11433683 PC1 PC2
TCGA.BH.A0C0 270.7446 0.1291 0 270.7446 0.1291
TCGA.A2.A3XY 87.9092 0.0128 1 87.9092 0.0128
TCGA.XX.A89A 255.1346 0.1530 1 255.1346 0.1530
I have gone through the R help section to find how .
is interpreted. I understand that .
is typically not used, but this is what I found
help(formula)
There are two special interpretations of
.
in a formula. The usual one is in the context of adata
argument of model fitting functions and means ‘all columns not otherwise in the formula’: seeterms.formula
. In the context ofupdate.formula
, only, it means ‘what was previously in this part of the formula’
help(terms.formula)
AllowDotAsName: normally
.
in a formula refers to the remaining variables contained indata
. Exceptionally,.
can be treated as a name for non-standard uses of formulae.
data
: a data frame from which the meaning of the special symbol.
can be inferred. It is unused if there is no.
in the formula.
However, I am not really sure what the statements mean. Can somebody give me a simple example of what it means in the context of statement and data I mentioned above?
The dot you see with the is_spam~. command means that there are no explanatory variables. Typically with model formulas, you will see y~x, but if you have no x variable, y~. says to guess at the value of y using no other variables.
P.S. In comments, Ben Hyde points to Google's R style guide, which recommends that variable names use dots, not underscore or camel case, for variable names (for example, “avg.
The lm() function In R, the lm(), or “linear model,” function can be used to create a simple regression model. The lm() function accepts a number of arguments (“Fitting Linear Models,” n.d.).
two dots is the second time derivative. so a dot is the same as d/dt. 5.
in the context of a data argument of model fitting functions and means ‘all columns not otherwise in the formula’
Exactly what it says there on the box!
So with
mymodel = lm(gene ~ ., data = mydata)
you get every variable other than gene
that's in mydata
on the RHS of the formula:
cna + rs11433683 + PC1 + PC2
As far as I can see, the quoted phrase is clear and unambiguous (... but you could also see it just from trying a few small examples)
The only thing that might not be obvious is what it does if you didn't supply a data
argument (but that's answered in the help of terms.formula
that is referred to in your quote).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With