Sorry to ask this ... it's surely a FAQ, and it's kind of a silly question, but it's been bugging me. Suppose I want to get the variance of every numeric column in a dataframe, such as
df <- data.frame(x=1:5,y=seq(1,50,10))
Naturally, I try
var(df)
Instead of giving me what I'd hoped for, which would be something like
x y
2.5 250
I get this
x y
x 2.5 25
y 25.0 250
which has the variances in the diagonal, and covariances in other locations. Which makes sense when I lookup help(var) and read that "var is just another interface to cov". Variance is covariance between a variable and itself, of course. The output is slightly confusing, but I can read along the diagonal, or generate only the variances using diag(var(df))
, sapply(df, var)
, or lapply(df, var)
, or by calling var
repeatedly on df$x
and df$y
.
But why? Variance is a routine, basic descriptive statistic, second only to mean. Shouldn't it be completely and totally trivial to apply it to columns of a dataframe? Why give me the covariances when I only asked for variances? Just curious. Thanks for any comments on this.
In R programming, covariance can be measured using cov() function. Covariance is a statistical term used to measures the direction of the linear relationship between the data vectors.
A variance-covariance matrix is a square matrix that contains the variances and covariances associated with several variables. The diagonal elements of the matrix contain the variances of the variables and the off-diagonal elements contain the covariances between all possible pairs of variables.
To create a Covariance matrix from a data frame in the R Language, we use the cov() function. The cov() function forms the variance-covariance matrix. It takes the data frame as an argument and returns the covariance matrix as result.
Theoretically, this is perfectly feasible, the bi-variate normal case being the easiest example.
The idiomatic approach is
sapply(df, var)
var
has a method for deals with data.frames
which data.frames
by coercing to a matrix
.
Variance
is a routine basic descriptive statistic, so are covariances and correlations. They are all interlinked and interesting , especially if you are aiming to use a linear model.
You could always create your own function to perform as you want
Var <- function(x,...){
if(is.data.frame(x)) {
return(sapply(x, var,...))} else { return(var(x,...))}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With