I want to add a new column to a data frame which is based on a row-wise calculation. Suppose I have a data frame such as this one:
x <-as.data.frame(matrix(1:10, 5, 2))
V1 V2
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
If I want to do some rowwise operation to generate a new column, I can use rowwise() and do() to accomplish that. For example:
y <- rowwise(x) %>% do (foo = .$V1 * .$V2)
I can even append this to the existing data frame as such:
y <- rowwise(x) %>% bind_cols(do (., foo = .$V1 * .$V2))
This all works, but the result isn't quite what I want. The values in y$foo are lists, not numeric.
V1 V2 foo
1 1 6 6
2 2 7 14
3 3 8 24
4 4 9 36
5 5 10 50
Looks right, but it isn't.
class(y$foo)
[1] "list"
So, two questions:
Update:
This is closer to what I am trying to do. Given this function:
pts <- 11:20
z <- function(x1, x2) {
min(x1*x2*pts)
}
This doesn't produce what I expect:
y <- x %>% mutate(foo = z(V1, V2))
V1 V2 foo
1 1 6 66
2 2 7 66
3 3 8 66
4 4 9 66
5 5 10 66
while this does:
y <-rowwise(x) %>% bind_cols( do (., data.frame(foo = z(.$V1, .$V2))))
V1 V2 foo
1 1 6 66
2 2 7 154
3 3 8 264
4 4 9 396
5 5 10 550
Why? Is there a better way?
I generally don't believe in row wise operations in a vectorized language such as R. In your case you could solve the question with a simple matrix multiplications.
You could define z
as follows
z <- function(x1, x2) {
do.call(pmin, as.data.frame(tcrossprod(x1 * x2, pts)))
}
Than a simple mutate
will do
x %>% mutate(foo = z(V1, V2))
# V1 V2 foo
# 1 1 6 66
# 2 2 7 154
# 3 3 8 264
# 4 4 9 396
# 5 5 10 550
You could also enhance performance using the matrixStats::rowMins
function (which is fully vectorized)
library(matrixStats)
z <- function(x1, x2) {
rowMins(tcrossprod(x1 * x2, pts))
}
x %>% mutate(foo = z(V1, V2))
# V1 V2 foo
# 1 1 6 66
# 2 2 7 154
# 3 3 8 264
# 4 4 9 396
# 5 5 10 550
You should just return a data.frame
in your do
statement:
y <- rowwise(x) %>% bind_cols(do(., data.frame(foo = .$V1 * .$V2)))
y
## V1 V2 foo
## 1 1 6 6
## 2 2 7 14
## 3 3 8 24
## 4 4 9 36
## 5 5 10 50
y$foo
## [1] 6 14 24 36 50
In your updated question, you are missing the rowwise
in the chain with the mutate
statement, but have the rowwise
in the chain with the do
statement. Just add rowwise
and you will get the same result.
x %>% rowwise %>% mutate(foo = z(V1, V2))
## Source: local data frame [5 x 3]
## Groups: <by row>
##
## V1 V2 foo
## 1 1 6 66
## 2 2 7 154
## 3 3 8 264
## 4 4 9 396
## 5 5 10 550
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With