Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use of .N and .SD in one call

Tags:

r

data.table

Suppose I have a data.table as follows -:

data = data.table(c("a","a","b","b","c"),c(1,2,3,4,5))

I would like to sum the numeric vector, only when the factor vector has more than one entry. The problem I have will require the use of .SD. I understand that I could create a N field via

data[ , N := .N, by = V1]

and then sum via

data[N > 1, lapply(.SD,sum), by = V1, .SDcols = 2]

However, is there a one step call to do this?

Referencing .SD in the call doesn't return an answer -

data[, lapply(.SD[which(length(.SD)>1)],sum), by = V1, .SDcols = 2] 

I would like to understand why this doesn't work. Neither does -:

data[, lapply(.SD[which(.N>1)],sum), by = V1, .SDcols = 2]

Thanks!

like image 845
RonRich Avatar asked Mar 20 '23 23:03

RonRich


1 Answers

data <- data.table(c("a","a","b","b","c"),c(1,2,3,4,5))
data[, if(.N > 1) lapply(.SD, sum) else NULL, by=V1]
#    V1 V2
# 1:  a  3
# 2:  b  7
like image 59
BrodieG Avatar answered Apr 06 '23 01:04

BrodieG