Quite often I need to calculate a new column for an R dataframe (in long form), whose value should depend on an aggregate function (e.g. sum) of a group. For instance, I might want to know what fraction of sales a product accounts for on any given day:
daily fraction = revenue for product i on day d / sum or revenue for all products on day d
My current strategy is to summarise and join:
library(dplyr)
join_summary <- function(data, ...) left_join(data, summarise(data, ...))
data = data.frame(
day = c(1,1,2,2,3,3),
product = rep(c("A", "B"), 3),
revenue = c(2, 4, 8, 7, 9, 2)
)
data2 <- data %>%
group_by(day) %>%
join_summary(daily_revenue = sum(revenue)) %>%
mutate(revenue_fraction = revenue / daily_revenue)
This works, but I am not sure if it is an anti-pattern. It seems a bit inefficient to repeat the same data (the daily revenue) on multiple rows, littering my data frame with aggregates. My questions are:
dplyr
or the wider Hadleyverse?join_summary
, or can it be done with the existing dplyr
verbs? (I prefer to stay within the "pipe flow", though.)You can use the mutate() function from the dplyr package to add one or more columns to a data frame in R.
Syntax: mutate(new-col-name = rowSums(.)) The rowSums() method is used to calculate the sum of each row and then append the value at the end of each row under the new column name specified.
Besides using summarise
to aggregate, you can use mutate
to assign summary stats to the full column:
data %>%
group_by(day) %>%
mutate(
daily_revenue = sum(revenue),
revenue_fraction = revenue / daily_revenue
)
which gives
Source: local data frame [6 x 5]
Groups: day [3]
day product revenue daily_revenue revenue_fraction
<dbl> <fctr> <dbl> <dbl> <dbl>
1 1 A 2 6 0.3333333
2 1 B 4 6 0.6666667
3 2 A 8 15 0.5333333
4 2 B 7 15 0.4666667
5 3 A 9 11 0.8181818
6 3 B 2 11 0.1818182
This works because the value sum(revenue)
is recycled to fill in all rows in the group.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With