Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I easily get the mean, median ,quartiles, etc. given counts of each value in R?

Tags:

r

statistics

Suppose I have a data frame with a column for values and another column for the number of times that value was observed:

x <- data.frame(value=c(1,2,3), count=c(4,2,1))
x
#   value count
# 1     1     4
# 2     2     2
# 3     3     1

I know that I can get the weighted mean of the data using weighted.mean and the weighted median using the weighted.median function provided by several packages (e.g. limma), but how can I get other weighted statistics on my data, such as 1st and 3rd quartiles, and maybe standard deviation? "Expanding" the data using rep is not an option because sum(x$count) is about 3 billion (the size of the human genome).

like image 963
Ryan C. Thompson Avatar asked Jan 20 '23 06:01

Ryan C. Thompson


2 Answers

Have you tried these packages:

  1. Hmisc -- it has several weighted statistics, including weighted quantiles

  2. laeken -- it has weighted quantiles.

like image 189
Prasad Chalasani Avatar answered Jan 22 '23 18:01

Prasad Chalasani


Or try to back-transform it, and run the analysis the usual way:

dtf <- data.frame(value = 1:3, count = c(4, 2, 1))
x <- with(dtf, rep(value, count))
summary(x)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   1.571   2.000   3.000 
fivenum(x)
[1] 1 1 1 2 3
like image 22
aL3xa Avatar answered Jan 22 '23 18:01

aL3xa