I saw in a tutorial about regression modeling the following command:
myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
What exactly does this command do, and what is the role of ~
(tilde) in the command?
AltGr + ^ will give you a tilde ~~~~ on a Linux system with an Italian keyboard, which is what you said you were using in the comments. Show activity on this post. You could use this variable when you need tilde in text. Alternately, you could just type tilde and copy and paste the character.
Tilde is a R's "Primitive Function" that does not evaluate its argument, and it is normally used to create a formula object as an inner-DSL role. I hijack this functionality to make an anounymous function. Double-tilde with a two-dots symbol, .. , makes an anonymous function in which two-dots plays a placeholder.
A tilde is a typographical symbol that resembles a wavy line (~). In English, it has no accepted usage in formal writing, but it may occasionally be used for a few different reasons in informal writing. This symbol is also used in math, computer programming, and to form certain letters in Spanish and Portuguese.
Let S(N) denote the time complexity of the for-loop. The "total" time complexity is thus C * T(N) , because the cost of each of T(N) iterations is C , which in tilde notation we can write as S(N) ~ C * ceil*(log_3 N) .
The thing on the right of <-
is a formula
object. It is often used to denote a statistical model, where the thing on the left of the ~
is the response and the things on the right of the ~
are the explanatory variables. So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width".
The myFormula <-
part of that line stores the formula in an object called myFormula
so you can use it in other parts of your R code.
Other common uses of formula objects in R
The lattice
package uses them to specify the variables to plot.
The ggplot2
package uses them to specify panels for plotting.
The dplyr
package uses them for non-standard evaulation.
R defines a ~
(tilde) operator for use in formulas. Formulas have all sorts of uses, but perhaps the most common is for regression:
library(datasets) lm( myFormula, data=iris)
help("~")
or help("formula")
will teach you more.
@Spacedman has covered the basics. Let's discuss how it works.
First, being an operator, note that it is essentially a shortcut to a function (with two arguments):
> `~`(lhs,rhs) lhs ~ rhs > lhs ~ rhs lhs ~ rhs
That can be helpful to know for use in e.g. apply
family commands.
Second, you can manipulate the formula as text:
oldform <- as.character(myFormula) # Get components myFormula <- as.formula( paste( oldform[2], "Sepal.Length", sep="~" ) )
Third, you can manipulate it as a list:
myFormula[[2]] myFormula[[3]]
Finally, there are some helpful tricks with formulae (see help("formula")
for more):
myFormula <- Species ~ .
For example, the version above is the same as the original version, since the dot means "all variables not yet used." This looks at the data.frame you use in your eventual model call, sees which variables exist in the data.frame but aren't explicitly mentioned in your formula, and replaces the dot with those missing variables.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With