Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get summary statistics in R after negative selection of a data frame

Tags:

dataframe

r

dplyr

I would like to negatively select (all but the given row value for each level of a factor variable) and summarize the data that remains. For a simple example, I have a data frame, DF, with two columns.

>DF
Category      Value  
A               5  
B               2  
C               3  
A               1  
C               1

It would look something like this if dplyr could negative select (can it?).

> DF %>% group_by(!Category) %>% summarise(avg = mean(Value))
!Category    avg
A            2.00               #average of all rows where category isn't A
B            2.50
C            2.67
like image 856
Mark Avatar asked Mar 21 '16 20:03

Mark


People also ask

How do I find summary statistics in R?

R provides a wide range of functions for obtaining summary statistics. One method of obtaining descriptive statistics is to use the sapply( ) function with a specified summary statistic. Possible functions used in sapply include mean, sd, var, min, max, median, range, and quantile.

How do I display a summary of a Dataframe in R?

R – Summary of Data Frame To get the summary of Data Frame, call summary() function and pass the Data Frame as argument to the function. We may pass additional arguments to summary() that affects the summary output. The output of summary() contains summary for each column.

How do I get the summary of a column in R?

summary statistic is computed using summary() function in R. summary() function is automatically applied to each column. The format of the result depends on the data type of the column. If the column is a numeric variable, mean, median, min, max and quartiles are returned.


2 Answers

Here's a way you could do it in base R:

edit: thanks for suggesting an extensible change @Ryan

> sapply(levels(DF$Category), FUN = function(x) mean(subset(DF, Category != x)$Value))

       A        B        C 
2.000000 2.500000 2.666667 
like image 107
bouncyball Avatar answered Oct 19 '22 13:10

bouncyball


Using data.table we can try:

library(data.table)
setDT(DF)[, DF[!Category %in% .BY[[1]], mean(Value)], by = Category]
#   Category       V1
#1:        A 2.000000
#2:        B 2.500000
#3:        C 2.666667
like image 2
mtoto Avatar answered Oct 19 '22 14:10

mtoto