Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does var act like cov in R?

Tags:

r

Sorry to ask this ... it's surely a FAQ, and it's kind of a silly question, but it's been bugging me. Suppose I want to get the variance of every numeric column in a dataframe, such as

df <- data.frame(x=1:5,y=seq(1,50,10))

Naturally, I try

var(df)

Instead of giving me what I'd hoped for, which would be something like

  x    y
2.5  250

I get this

     x   y
x  2.5  25
y 25.0 250

which has the variances in the diagonal, and covariances in other locations. Which makes sense when I lookup help(var) and read that "var is just another interface to cov". Variance is covariance between a variable and itself, of course. The output is slightly confusing, but I can read along the diagonal, or generate only the variances using diag(var(df)), sapply(df, var), or lapply(df, var), or by calling var repeatedly on df$x and df$y.

But why? Variance is a routine, basic descriptive statistic, second only to mean. Shouldn't it be completely and totally trivial to apply it to columns of a dataframe? Why give me the covariances when I only asked for variances? Just curious. Thanks for any comments on this.

like image 701
Mars Avatar asked Mar 27 '13 03:03

Mars


People also ask

What is cov in R?

In R programming, covariance can be measured using cov() function. Covariance is a statistical term used to measures the direction of the linear relationship between the data vectors.

What is var cov matrix?

A variance-covariance matrix is a square matrix that contains the variances and covariances associated with several variables. The diagonal elements of the matrix contain the variances of the variables and the off-diagonal elements contain the covariances between all possible pairs of variables.

How do you get the VAR cov matrix in R?

To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.

Can covariance be larger than variance?

Theoretically, this is perfectly feasible, the bi-variate normal case being the easiest example.


1 Answers

The idiomatic approach is

sapply(df, var)

var has a method for data.frames which deals with data.frames by coercing to a matrix.

Variance is a routine basic descriptive statistic, so are covariances and correlations. They are all interlinked and interesting , especially if you are aiming to use a linear model.

You could always create your own function to perform as you want

Var  <- function(x,...){
  if(is.data.frame(x)) {
   return(sapply(x, var,...))} else { return(var(x,...))}
}
like image 85
mnel Avatar answered Sep 19 '22 08:09

mnel