Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way to sum all columns whose name starts with a pattern?

Tags:

r

data.table

My goal is to sum all values in columns that start with the prefix skill_ in a data.table. I would prefer a solution using data.table but I am not picky.

My solution up to now:

> require(data.table)
> DT <- data.table(x=1:4, skill_a=c(0,1,0,0), skill_b=c(0,1,1,0), skill_c=c(0,1,1,1))
> DT[, row_idx := 1:nrow(DT)]
> DT[, count_skills := 
          sapply(1:nrow(DT), 
                 function(id) sum(DT[row_idx == id, 
                                     grepl("skill_", names(DT)), with=FALSE]))]

> DT
   x skill_a skill_b skill_c row_idx count_skills
1: 1       0       0       0       1            0
2: 2       1       1       1       2            3
3: 3       0       1       1       3            2
4: 4       0       0       1       4            1

But this becomes very slow when DT is very large. Is there a more efficient way to do this?

like image 819
Rodrigo Avatar asked Apr 22 '14 14:04

Rodrigo


2 Answers

Solution using data.table and .SDcols.

require(data.table)

DT <- data.table(x=1:4, skill_a=c(0,1,0,0), skill_b=c(0,1,1,0),
                 skill_c=c(0,1,1,1))

DT[, row_idx := 1:nrow(DT)]

DT[, count_skills := Reduce(`+`, .SD), .SDcols = patterns("skill_")]
DT
like image 129
djhurio Avatar answered Sep 28 '22 03:09

djhurio


Here is a dplyr solution:

library(dplyr)

DT %>% mutate(count = DT %>% select(starts_with("skill_")) %>% rowSums())
like image 20
G. Grothendieck Avatar answered Sep 28 '22 05:09

G. Grothendieck