Given the following data.frame
d <- rep(c("a", "b"), each=5)
l <- rep(1:5, 2)
v <- 1:10
df <- data.frame(d=d, l=l, v=v*v)
df
d l v
1 a 1 1
2 a 2 4
3 a 3 9
4 a 4 16
5 a 5 25
6 b 1 36
7 b 2 49
8 b 3 64
9 b 4 81
10 b 5 100
Now I want to add another column after grouping by l. The extra column should contain the value of v_b - v_a
d l v e
1 a 1 1 35 (36-1)
2 a 2 4 45 (49-4)
3 a 3 9 55 (64-9)
4 a 4 16 65 (81-16)
5 a 5 25 75 (100-25)
6 b 1 36 35 (36-1)
7 b 2 49 45 (49-4)
8 b 3 64 55 (64-9)
9 b 4 81 65 (81-16)
10 b 5 100 75 (100-25)
In paranthesis the way how to calculate the value.
I'm looking for a way using dplyr. So I started with something like this
df %.%
group_by(l) %.%
mutate(e=myCustomFunction)
But how should I define myCustomFunction? I thought grouping of the data.frame produces another (sub-)data.frame which is a parameter to this function. But it isn't...
I guess this is the dplyr equivalent to @jlhoward's data.table solution:
df %>%
group_by(l) %>%
mutate(e = v[d == "b"] - v[d == "a"])
If you want to use a custom function, here's a possible way:
myfunc <- function(x) {
with(x, v[d == "b"] - v[d == "a"])
}
test %>%
group_by(l) %>%
do(data.frame(. , e = myfunc(.))) %>%
arrange(d, l) # <- just to get it back in the original order
As hadley commented below, it would be better in this case to define the function as
f <- function(v, d) v[d == "b"] - v[d == "a"]
and then use the custom function f inside a mutate:
df %>%
group_by(l) %>%
mutate(e = f(v, d))
Thanks @hadley for the comment.
Using dplyr:
df %.%
group_by(l) %.%
mutate(e=diff(v))
# d l v e
# 1 a 1 1 35
# 2 a 2 4 45
# 3 a 3 9 55
# 4 a 4 16 65
# 5 a 5 25 75
# 6 b 1 36 35
# 7 b 2 49 45
# 8 b 3 64 55
# 9 b 4 81 65
# 10 b 5 100 75
Here's an approach using data tables.
library(data.table)
DT <- as.data.table(df)
DT[,e := diff(v), by=l]
These approaches using diff(...) assume your data frame is sorted as in your example. If not, this is a more reliable way to do the same thing.
DT[, e := .SD[d == "b", v] - .SD[d == "a", v], by=l]
(or) even more directly
DT[, e := v[d == "b"] - v[d == "a"], by=l]
But if you want to access the entire subset of data and pass it to your custom function, then you can use .SD. Also make sure you read about ?.SDcols from ?data.table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With