I am struggling a little with <code>dplyr</code> because I want to do two things at one and wonder if it is possible. I want to calculate the mean of values and at the same time the mean for the values which have a specific value in an other column. <pre class="prettyprint"><code>library(dplyr) set.seed(1234) df <- data.frame(id=rep(1:10, each=14), tp=letters[1:14], value_type=sample(LETTERS[1:3], 140, replace=TRUE), values=runif(140)) df %>% group_by(id, tp) %>% summarise( all_mean=mean(values), A_mean=mean(values), # Only the values with value_type A value_count=sum(value_type == 'A') ) </code></pre> So the <code>A_mean</code> column should calculate the mean of <code>values</code> where <code>value_count == 'A'</code>. I would normally do two separate commands and merge the results later, but I guess there is a more handy way and I just don't get it. Thanks in advance.

We can try <pre class="prettyprint"><code> df %>% group_by(id, tp) %>% summarise(all_mean = mean(values), A_mean = mean(values[value_type=="A"]), value_count=sum(value_type == 'A')) </code></pre>

filtering within the summarise function of dplyr

Tags:

r

dplyr

I am struggling a little with dplyr because I want to do two things at one and wonder if it is possible.

I want to calculate the mean of values and at the same time the mean for the values which have a specific value in an other column.

library(dplyr)
set.seed(1234)
df <- data.frame(id=rep(1:10, each=14),
                 tp=letters[1:14],
                 value_type=sample(LETTERS[1:3], 140, replace=TRUE),
                 values=runif(140))

df %>%
  group_by(id, tp) %>%
  summarise(
    all_mean=mean(values),
    A_mean=mean(values), # Only the values with value_type A
    value_count=sum(value_type == 'A')
  )

So the A_mean column should calculate the mean of values where value_count == 'A'.

I would normally do two separate commands and merge the results later, but I guess there is a more handy way and I just don't get it.

Thanks in advance.

678

asked Jun 29 '16 08:06

drmariod

2 Answers

We can try

 df %>%
     group_by(id, tp) %>%
     summarise(all_mean = mean(values), 
                A_mean = mean(values[value_type=="A"]),
                value_count=sum(value_type == 'A'))

127

answered Oct 18 '22 21:10

akrun

You can do this with two summary steps:

df %>%
  group_by(id, tp, value_type) %>%
  summarise(A_mean = mean(values)) %>%
  summarise(all_mean = mean(A_mean),
            A_mean = sum(A_mean * (value_type == "A")),
            value_count = sum(value_type == "A"))

The first summary calculates the means per value_type and the second "sums" only the mean of value_type == "A"

answered Oct 18 '22 21:10

AlexR

Related questions
                            
                                Linear interpolation in R
                            
                                ASCII Plotting Functions for R [duplicate]
                            
                                fread - skip lines starting with certain character - "#"
                            
                                forcing ggplot2 y-axis label to be of integer only, and give proper breaks [duplicate]
                            
                                R: Replace multiple values in multiple columns of dataframes with NA
                            
                                Can we get factor matrices in R?
                            
                                how to use coord_carteisan and coord_flip together in ggplot2
                            
                                Source script to separate environment in R, not the global environment
                            
                                Visualizing R Function Dependencies
                            
                                setting distance matrix and clustering methods in heatmap.2
                            
                                Using the Tufte-Latex class in Sweave
                            
                                calling R script from java
                            
                                R - What algorithm does geom_density() use and how to extract points/equation of curves?
                            
                                Split time series data into time intervals (say an hour) and then plot the count
                            
                                How do I rename a data-frame in a for-loop
                            
                                Odds ratios instead of logits in stargazer() LaTeX output
                            
                                Align ggplot2 plots vertically
                            
                                How to use earlier declared variables within aes in ggplot with special operators (..count.., etc.)
                            
                                How to save a data frame in a txt or excel file separated by columns
                            
                                R ggplot - Error stat_bin requires continuous x variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With