I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Description. M = mean( A ) returns the mean of the elements of A along the first array dimension whose size does not equal 1. If A is a vector, then mean(A) returns the mean of the elements. If A is a matrix, then mean(A) returns a row vector containing the mean of each column.
Therefore, to find the mean of all values in an R data frame, we need to convert it to a matrix first then use the mean function.
Calculate the Mean of each Column of a Matrix or Array in R Programming – colMeans() Function. colMeans() function in R Language is used to compute the mean of each column of a matrix or array. dims: integer value, which dimensions are regarded as 'columns' to sum over. It is over dimensions 1:dims.
In R, the median of a vector is calculated using the median() function. The function accepts a vector as an input. If there are an odd number of values in the vector, the function returns the middle value. If there are an even number of values in the vector, the function returns the average of the two medians.
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean
and median
.
For a matrix, or array, as the others have stated, mean
and median
will return a single value. However, var
will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var
goes back to returning a single value. sd
on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad
returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector()
first. Having fun yet?
For a data.frame
, mean
is deprecated, but will again act on the columns separately. median
requires that you coerce to a vector first, or unlist
. As before, var
will return the covariances, and sd
is again deprecated but will return the standard deviation of the columns. mad
requires that you coerce to a vector or unlist
. In general for a data.frame
if you want something to act on all values, you generally will just unlist
it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are defunct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With