I would like to negatively select (all but the given row value for each level of a factor variable) and summarize the data that remains. For a simple example, I have a data frame, DF, with two columns.
>DF
Category Value
A 5
B 2
C 3
A 1
C 1
It would look something like this if dplyr could negative select (can it?).
> DF %>% group_by(!Category) %>% summarise(avg = mean(Value))
!Category avg
A 2.00 #average of all rows where category isn't A
B 2.50
C 2.67
R provides a wide range of functions for obtaining summary statistics. One method of obtaining descriptive statistics is to use the sapply( ) function with a specified summary statistic. Possible functions used in sapply include mean, sd, var, min, max, median, range, and quantile.
R – Summary of Data Frame To get the summary of Data Frame, call summary() function and pass the Data Frame as argument to the function. We may pass additional arguments to summary() that affects the summary output. The output of summary() contains summary for each column.
summary statistic is computed using summary() function in R. summary() function is automatically applied to each column. The format of the result depends on the data type of the column. If the column is a numeric variable, mean, median, min, max and quartiles are returned.
Here's a way you could do it in base R
:
edit: thanks for suggesting an extensible change @Ryan
> sapply(levels(DF$Category), FUN = function(x) mean(subset(DF, Category != x)$Value))
A B C
2.000000 2.500000 2.666667
Using data.table
we can try:
library(data.table)
setDT(DF)[, DF[!Category %in% .BY[[1]], mean(Value)], by = Category]
# Category V1
#1: A 2.000000
#2: B 2.500000
#3: C 2.666667
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With