I would like to negatively select (all but the given row value for each level of a factor variable) and summarize the data that remains. For a simple example, I have a data frame, DF, with two columns. <pre class="prettyprint"><code>>DF Category Value A 5 B 2 C 3 A 1 C 1 </code></pre> It would look something like this if dplyr could negative select (can it?). <pre class="prettyprint"><code>> DF %>% group_by(!Category) %>% summarise(avg = mean(Value)) !Category avg A 2.00 #average of all rows where category isn't A B 2.50 C 2.67 </code></pre>

Here's a way you could do it in base <code>R</code>: edit: thanks for suggesting an extensible change @Ryan <pre class="prettyprint"><code>> sapply(levels(DF$Category), FUN = function(x) mean(subset(DF, Category != x)$Value)) A B C 2.000000 2.500000 2.666667 </code></pre>

How do I get summary statistics in R after negative selection of a data frame

Tags:

dataframe

r

dplyr

I would like to negatively select (all but the given row value for each level of a factor variable) and summarize the data that remains. For a simple example, I have a data frame, DF, with two columns.

>DF
Category      Value  
A               5  
B               2  
C               3  
A               1  
C               1

It would look something like this if dplyr could negative select (can it?).

> DF %>% group_by(!Category) %>% summarise(avg = mean(Value))
!Category    avg
A            2.00               #average of all rows where category isn't A
B            2.50
C            2.67

856

asked Mar 21 '16 20:03

Mark

2 Answers

Here's a way you could do it in base R:

edit: thanks for suggesting an extensible change @Ryan

> sapply(levels(DF$Category), FUN = function(x) mean(subset(DF, Category != x)$Value))

       A        B        C 
2.000000 2.500000 2.666667

107

answered Oct 19 '22 13:10

bouncyball

Using data.table we can try:

library(data.table)
setDT(DF)[, DF[!Category %in% .BY[[1]], mean(Value)], by = Category]
#   Category       V1
#1:        A 2.000000
#2:        B 2.500000
#3:        C 2.666667

answered Oct 19 '22 14:10

mtoto

Related questions
                            
                                ggplot with multiple regression lines to show random effects
                            
                                How to merge xts objects with slightly different columns?
                            
                                to delete characters in column names
                            
                                Error in family$linkinv(eta) : Argument eta must be a nonempty numeric vector
                            
                                pipeline operator in Scala
                            
                                Filter rows with dplyr/magrittr based on entire row
                            
                                Convert dataframe to igraph error: Duplicate vertex names
                            
                                ggparcoord: color using discrete scale
                            
                                Use dependencies in R packages through library() / Description file
                            
                                How to deal with NA in two lists?
                            
                                Transpose a data.table (columns names -> first column of output)
                            
                                Identify start date, end date, length of run of consecutive number, and transpose into new data frame
                            
                                R: "Error: 1: Input is not proper UTF-8, indicate encoding ! Bytes: 0xC9 0x74 0x61 0x74"
                            
                                R- import CSV file, all data fall into one (the first) column
                            
                                Rownames as column in list of dataframes
                            
                                extending a function that takes a data.table as an argument to use the full table (instead of a subset)
                            
                                Drop legend using show.legend = FALSE does not work on a continuous aesthetic
                            
                                Filter row names based on string length [duplicate]
                            
                                Moving Averages on multiple columns - Grouped Data
                            
                                Multiple Logistic Regression with Interaction between Quantitative and Qualitative Explanatory Variables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With