Given the following data.frame
d <- rep(c("a", "b"), each=5)
l <- rep(1:5, 2)
v <- 1:10
df <- data.frame(d=d, l=l, v=v*v)
df
d l v
1 a 1 1
2 a 2 4
3 a 3 9
4 a 4 16
5 a 5 25
6 b 1 36
7 b 2 49
8 b 3 64
9 b 4 81
10 b 5 100
Now I want to add another column after grouping by l. The extra column should contain the value of v_b - v_a
d l v e
1 a 1 1 35 (36-1)
2 a 2 4 45 (49-4)
3 a 3 9 55 (64-9)
4 a 4 16 65 (81-16)
5 a 5 25 75 (100-25)
6 b 1 36 35 (36-1)
7 b 2 49 45 (49-4)
8 b 3 64 55 (64-9)
9 b 4 81 65 (81-16)
10 b 5 100 75 (100-25)
In paranthesis the way how to calculate the value.
I'm looking for a way using dplyr. So I started with something like this
df %.%
group_by(l) %.%
mutate(e=myCustomFunction)
But how should I define myCustomFunction? I thought grouping of the data.frame produces another (sub-)data.frame which is a parameter to this function. But it isn't...
I guess this is the dplyr
equivalent to @jlhoward's data.table
solution:
df %>%
group_by(l) %>%
mutate(e = v[d == "b"] - v[d == "a"])
If you want to use a custom function, here's a possible way:
myfunc <- function(x) {
with(x, v[d == "b"] - v[d == "a"])
}
test %>%
group_by(l) %>%
do(data.frame(. , e = myfunc(.))) %>%
arrange(d, l) # <- just to get it back in the original order
As hadley commented below, it would be better in this case to define the function as
f <- function(v, d) v[d == "b"] - v[d == "a"]
and then use the custom function f
inside a mutate
:
df %>%
group_by(l) %>%
mutate(e = f(v, d))
Thanks @hadley for the comment.
Using dplyr
:
df %.%
group_by(l) %.%
mutate(e=diff(v))
# d l v e
# 1 a 1 1 35
# 2 a 2 4 45
# 3 a 3 9 55
# 4 a 4 16 65
# 5 a 5 25 75
# 6 b 1 36 35
# 7 b 2 49 45
# 8 b 3 64 55
# 9 b 4 81 65
# 10 b 5 100 75
Here's an approach using data tables.
library(data.table)
DT <- as.data.table(df)
DT[,e := diff(v), by=l]
These approaches using diff(...)
assume your data frame
is sorted as in your example. If not, this is a more reliable way to do the same thing.
DT[, e := .SD[d == "b", v] - .SD[d == "a", v], by=l]
(or) even more directly
DT[, e := v[d == "b"] - v[d == "a"], by=l]
But if you want to access the entire subset of data and pass it to your custom function, then you can use .SD
. Also make sure you read about ?.SDcols
from ?data.table
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With