Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr rowwise by some columns

What is the dplyr way to apply a function rowwise for some columns. For example I want to Grab all the V, columns and turn them into percents based on the row sums. I show how to do it in base. What about in a dplyr chain. It'd nice to see in data.table form as well (though preference would go to a dplyr solution here).

x <- data.frame(A=LETTERS[1:5], as.data.frame(matrix(sample(0:5, 25, T), ncol=5)))

data.frame(x[1], x[-1]/rowSums(x[-1]))


##   A        V1        V2        V3         V4         V5
## 1 A 0.1428571 0.2142857 0.2142857 0.35714286 0.07142857
## 2 B 0.2000000 0.2000000 0.1500000 0.20000000 0.25000000
## 3 C 0.3571429 0.2857143 0.0000000 0.07142857 0.28571429
## 4 D 0.1904762 0.2380952 0.1904762 0.23809524 0.14285714
## 5 E 0.2000000 0.2500000 0.1500000 0.25000000 0.15000000

library(dplyr)

props <- function(x) round(x/sum(x), 2)

# does not work
x %>%
    rowwise()
    mutate(props(matches("^.{2}$")))
like image 202
Tyler Rinker Avatar asked Apr 09 '16 20:04

Tyler Rinker


1 Answers

In data.table, you can do

library(data.table)
setDT(x)

x[, grep("^V",names(DT)) := .SD/Reduce(`+`, .SD), .SDcols = V1:V5]

   A         V1        V2        V3         V4         V5
1: A 0.28571429 0.0000000 0.2857143 0.07142857 0.35714286
2: B 0.23076923 0.2307692 0.3076923 0.15384615 0.07692308
3: C 0.44444444 0.0000000 0.4444444 0.00000000 0.11111111
4: D 0.07142857 0.3571429 0.1428571 0.07142857 0.35714286
5: E 0.00000000 0.2222222 0.3333333 0.44444444 0.00000000

To compute the denominator with NA values ignored, I guess rowSums is an option, though it will coerce .SD to a matrix as an intermediate step.

like image 82
Frank Avatar answered Sep 19 '22 19:09

Frank