Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

custom function after grouping data.fame

Tags:

dataframe

r

dplyr

Given the following data.frame

d <- rep(c("a", "b"), each=5)
l <- rep(1:5, 2) 
v <- 1:10

df       <- data.frame(d=d, l=l, v=v*v)
df
   d l   v
1  a 1   1
2  a 2   4
3  a 3   9
4  a 4  16
5  a 5  25
6  b 1  36
7  b 2  49
8  b 3  64
9  b 4  81
10 b 5 100

Now I want to add another column after grouping by l. The extra column should contain the value of v_b - v_a

   d l   v    e
1  a 1   1    35 (36-1)
2  a 2   4    45 (49-4)
3  a 3   9    55 (64-9)
4  a 4  16    65 (81-16)
5  a 5  25    75 (100-25)
6  b 1  36    35 (36-1)
7  b 2  49    45 (49-4)
8  b 3  64    55 (64-9)
9  b 4  81    65 (81-16)
10 b 5 100    75 (100-25)

In paranthesis the way how to calculate the value.

I'm looking for a way using dplyr. So I started with something like this

df %.% 
 group_by(l) %.%
 mutate(e=myCustomFunction)

But how should I define myCustomFunction? I thought grouping of the data.frame produces another (sub-)data.frame which is a parameter to this function. But it isn't...

like image 730
JerryWho Avatar asked Jun 09 '14 19:06

JerryWho


3 Answers

I guess this is the dplyr equivalent to @jlhoward's data.table solution:

df %>%
  group_by(l) %>%
  mutate(e = v[d == "b"] - v[d == "a"])

Edit after comment by OP:

If you want to use a custom function, here's a possible way:

myfunc <- function(x) {
  with(x, v[d == "b"] - v[d == "a"])
}

test %>%
  group_by(l) %>%
  do(data.frame(. , e = myfunc(.))) %>%
  arrange(d, l)                   # <- just to get it back in the original order

Edit after comment by @hadley:

As hadley commented below, it would be better in this case to define the function as

f <- function(v, d) v[d == "b"] - v[d == "a"]

and then use the custom function f inside a mutate:

df %>%
  group_by(l) %>%
  mutate(e = f(v, d))  

Thanks @hadley for the comment.

like image 182
talat Avatar answered Nov 09 '22 19:11

talat


Using dplyr:

df %.%   
  group_by(l)  %.%
  mutate(e=diff(v))

# d l   v  e
# 1  a 1   1 35
# 2  a 2   4 45
# 3  a 3   9 55
# 4  a 4  16 65
# 5  a 5  25 75
# 6  b 1  36 35
# 7  b 2  49 45
# 8  b 3  64 55
# 9  b 4  81 65
# 10 b 5 100 75
like image 34
agstudy Avatar answered Nov 09 '22 20:11

agstudy


Here's an approach using data tables.

library(data.table)
DT <- as.data.table(df)
DT[,e := diff(v), by=l]

These approaches using diff(...) assume your data frame is sorted as in your example. If not, this is a more reliable way to do the same thing.

DT[, e := .SD[d == "b", v] - .SD[d == "a", v], by=l]

(or) even more directly

DT[, e := v[d == "b"] - v[d == "a"], by=l]

But if you want to access the entire subset of data and pass it to your custom function, then you can use .SD. Also make sure you read about ?.SDcols from ?data.table.

like image 4
jlhoward Avatar answered Nov 09 '22 20:11

jlhoward