I am struggling a little with dplyr
because I want to do two things at one and wonder if it is possible.
I want to calculate the mean of values and at the same time the mean for the values which have a specific value in an other column.
library(dplyr)
set.seed(1234)
df <- data.frame(id=rep(1:10, each=14),
tp=letters[1:14],
value_type=sample(LETTERS[1:3], 140, replace=TRUE),
values=runif(140))
df %>%
group_by(id, tp) %>%
summarise(
all_mean=mean(values),
A_mean=mean(values), # Only the values with value_type A
value_count=sum(value_type == 'A')
)
So the A_mean
column should calculate the mean of values
where value_count == 'A'
.
I would normally do two separate commands and merge the results later, but I guess there is a more handy way and I just don't get it.
Thanks in advance.
The filter() function is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions.
In this, first, pass your dataframe object to the filter function, then in the condition parameter write the column name in which you want to filter multiple values then put the %in% operator, and then pass a vector containing all the string values which you want in the result.
summarise() creates a new data frame. It will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in the input.
GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren't aggregated should be specified in a GROUP BY clause in the query.
We can try
df %>%
group_by(id, tp) %>%
summarise(all_mean = mean(values),
A_mean = mean(values[value_type=="A"]),
value_count=sum(value_type == 'A'))
You can do this with two summary steps:
df %>%
group_by(id, tp, value_type) %>%
summarise(A_mean = mean(values)) %>%
summarise(all_mean = mean(A_mean),
A_mean = sum(A_mean * (value_type == "A")),
value_count = sum(value_type == "A"))
The first summary calculates the means per value_type
and the second "sums" only the mean of value_type == "A"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With