My goal is to sum all values in columns that start with the prefix skill_
in a data.table
. I would prefer a solution using data.table
but I am not picky.
My solution up to now:
> require(data.table)
> DT <- data.table(x=1:4, skill_a=c(0,1,0,0), skill_b=c(0,1,1,0), skill_c=c(0,1,1,1))
> DT[, row_idx := 1:nrow(DT)]
> DT[, count_skills :=
sapply(1:nrow(DT),
function(id) sum(DT[row_idx == id,
grepl("skill_", names(DT)), with=FALSE]))]
> DT
x skill_a skill_b skill_c row_idx count_skills
1: 1 0 0 0 1 0
2: 2 1 1 1 2 3
3: 3 0 1 1 3 2
4: 4 0 0 1 4 1
But this becomes very slow when DT is very large. Is there a more efficient way to do this?
Solution using data.table
and .SDcols
.
require(data.table)
DT <- data.table(x=1:4, skill_a=c(0,1,0,0), skill_b=c(0,1,1,0),
skill_c=c(0,1,1,1))
DT[, row_idx := 1:nrow(DT)]
DT[, count_skills := Reduce(`+`, .SD), .SDcols = patterns("skill_")]
DT
Here is a dplyr solution:
library(dplyr)
DT %>% mutate(count = DT %>% select(starts_with("skill_")) %>% rowSums())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With