Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use dplyr to generate a new column based on rowwise data?

Tags:

r

dplyr

I want to add a new column to a data frame which is based on a row-wise calculation. Suppose I have a data frame such as this one:

x <-as.data.frame(matrix(1:10, 5, 2))

  V1 V2
1  1  6
2  2  7
3  3  8
4  4  9
5  5 10

If I want to do some rowwise operation to generate a new column, I can use rowwise() and do() to accomplish that. For example:

y <- rowwise(x) %>% do (foo = .$V1 * .$V2)

I can even append this to the existing data frame as such:

y <- rowwise(x) %>% bind_cols(do (., foo = .$V1 * .$V2))

This all works, but the result isn't quite what I want. The values in y$foo are lists, not numeric.

  V1 V2 foo
1  1  6   6
2  2  7  14
3  3  8  24
4  4  9  36
5  5 10  50

Looks right, but it isn't.

class(y$foo)
[1] "list"

So, two questions:

  1. Is there a way to make the results numeric instead of lists?
  2. Is there a better way I should be approaching this?

Update:
This is closer to what I am trying to do. Given this function:

pts <- 11:20
z <- function(x1, x2) {
  min(x1*x2*pts)
}

This doesn't produce what I expect:

y <- x %>% mutate(foo = z(V1, V2))
  V1 V2 foo
1  1  6  66
2  2  7  66
3  3  8  66
4  4  9  66
5  5 10  66

while this does:

y <-rowwise(x) %>% bind_cols( do (., data.frame(foo = z(.$V1, .$V2))))
  V1 V2 foo
1  1  6  66
2  2  7 154
3  3  8 264
4  4  9 396
5  5 10 550

Why? Is there a better way?

like image 721
Steve Rowe Avatar asked Dec 20 '22 04:12

Steve Rowe


2 Answers

I generally don't believe in row wise operations in a vectorized language such as R. In your case you could solve the question with a simple matrix multiplications.

You could define z as follows

z <- function(x1, x2) {
  do.call(pmin, as.data.frame(tcrossprod(x1 * x2, pts)))
}

Than a simple mutate will do

x %>% mutate(foo = z(V1, V2))
#   V1 V2 foo
# 1  1  6  66
# 2  2  7 154
# 3  3  8 264
# 4  4  9 396
# 5  5 10 550

You could also enhance performance using the matrixStats::rowMins function (which is fully vectorized)

library(matrixStats)

z <- function(x1, x2) {
  rowMins(tcrossprod(x1 * x2, pts))
}

x %>% mutate(foo = z(V1, V2))
#   V1 V2 foo
# 1  1  6  66
# 2  2  7 154
# 3  3  8 264
# 4  4  9 396
# 5  5 10 550
like image 98
David Arenburg Avatar answered Dec 28 '22 22:12

David Arenburg


You should just return a data.frame in your do statement:

y <- rowwise(x) %>% bind_cols(do(., data.frame(foo = .$V1 * .$V2)))
y
##   V1 V2 foo
## 1  1  6   6
## 2  2  7  14
## 3  3  8  24
## 4  4  9  36
## 5  5 10  50
y$foo
## [1]  6 14 24 36 50

In your updated question, you are missing the rowwise in the chain with the mutate statement, but have the rowwise in the chain with the do statement. Just add rowwise and you will get the same result.

x %>% rowwise %>% mutate(foo = z(V1, V2))
## Source: local data frame [5 x 3]
## Groups: <by row>
## 
##   V1 V2 foo
## 1  1  6  66
## 2  2  7 154
## 3  3  8 264
## 4  4  9 396
## 5  5 10 550
like image 23
shadow Avatar answered Dec 28 '22 22:12

shadow