Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of ~ (tilde) in R programming Language

Tags:

r

r-faq

r-formula

I saw in a tutorial about regression modeling the following command:

myFormula <- Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width 

What exactly does this command do, and what is the role of ~ (tilde) in the command?

like image 611
Ankita Avatar asked Feb 20 '13 09:02

Ankita


People also ask

How do you make a tilde in R?

AltGr + ^ will give you a tilde ~~~~ on a Linux system with an Italian keyboard, which is what you said you were using in the comments. Show activity on this post. You could use this variable when you need tilde in text. Alternately, you could just type tilde and copy and paste the character.

What does double tilde mean in R?

Tilde is a R's "Primitive Function" that does not evaluate its argument, and it is normally used to create a formula object as an inner-DSL role. I hijack this functionality to make an anounymous function. Double-tilde with a two-dots symbol, .. , makes an anonymous function in which two-dots plays a placeholder.

What is a tilde symbol?

A tilde is a typographical symbol that resembles a wavy line (~). In English, it has no accepted usage in formal writing, but it may occasionally be used for a few different reasons in informal writing. This symbol is also used in math, computer programming, and to form certain letters in Spanish and Portuguese.

How do you calculate tilde?

Let S(N) denote the time complexity of the for-loop. The "total" time complexity is thus C * T(N) , because the cost of each of T(N) iterations is C , which in tilde notation we can write as S(N) ~ C * ceil*(log_3 N) .


2 Answers

The thing on the right of <- is a formula object. It is often used to denote a statistical model, where the thing on the left of the ~ is the response and the things on the right of the ~ are the explanatory variables. So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width".

The myFormula <- part of that line stores the formula in an object called myFormula so you can use it in other parts of your R code.


Other common uses of formula objects in R

The lattice package uses them to specify the variables to plot.
The ggplot2 package uses them to specify panels for plotting.
The dplyr package uses them for non-standard evaulation.

like image 174
Spacedman Avatar answered Oct 23 '22 14:10

Spacedman


R defines a ~ (tilde) operator for use in formulas. Formulas have all sorts of uses, but perhaps the most common is for regression:

library(datasets) lm( myFormula, data=iris) 

help("~") or help("formula") will teach you more.

@Spacedman has covered the basics. Let's discuss how it works.

First, being an operator, note that it is essentially a shortcut to a function (with two arguments):

> `~`(lhs,rhs) lhs ~ rhs > lhs ~ rhs lhs ~ rhs 

That can be helpful to know for use in e.g. apply family commands.

Second, you can manipulate the formula as text:

oldform <- as.character(myFormula) # Get components myFormula <- as.formula( paste( oldform[2], "Sepal.Length", sep="~" ) ) 

Third, you can manipulate it as a list:

myFormula[[2]] myFormula[[3]] 

Finally, there are some helpful tricks with formulae (see help("formula") for more):

myFormula <- Species ~ .  

For example, the version above is the same as the original version, since the dot means "all variables not yet used." This looks at the data.frame you use in your eventual model call, sees which variables exist in the data.frame but aren't explicitly mentioned in your formula, and replaces the dot with those missing variables.

like image 30
Ari B. Friedman Avatar answered Oct 23 '22 14:10

Ari B. Friedman