Im trying to perform a simple sum and mean by rows using data.table, but I am getting unexpected results. I followed the help in section 2 of the FAQ manual for data.table. I found a way that works, but I am not sure why this method in section 2 of the FAQ is not. This method gives me the incorrect result (i.e., it is giving me the value of the first column):
dt[, genesum:=lapply(.SD,sum), by=gene]
head(dt)
gene TCGA_04_1348 TCGA_04_1362 genesum
1: A1BG 0.94565 0.70585 0.94565
2: A1BG-AS 0.97610 1.15850 0.97610
3: A1CF 0.00000 0.02105 0.00000
4: A2BP1 0.00300 0.04150 0.00300
5: A2LD1 4.57975 5.02820 4.57975
6: A2M 60.37320 36.09715 60.37320
and this is giving me the desired result
dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)]
head(dt)
gene TCGA_04_1348 TCGA_04_1362 genesum
1: A1BG 0.94565 0.70585 1.65150
2: A1BG-AS 0.97610 1.15850 2.13460
3: A1CF 0.00000 0.02105 0.02105
4: A2BP1 0.00300 0.04150 0.04450
5: A2LD1 4.57975 5.02820 9.60795
6: A2M 60.37320 36.09715 96.47035
I have many more columns and rows, this is just a subset. Does this have anything to do with the way I set the key?
tables()
NAME NROW MB COLS KEY
[1,] dt 20,785 2 gene,TCGA_04_1348_01A,TCGA_04_1362_01A,genesum gene
A few things:
dt[, genesum:=lapply(.SD,sum), by=gene]
and dt[, genesum:=apply(dt[ ,-1],1, sum)]
are quite different.
dt[, genesum:=lapply(.SD,sum), by=gene]
loops over the columns of the .SD
data.table and sums them
dt[, genesum:=apply(dt[, -1], 1, sum)]
is looping over the rows (ie. apply(x, 1, function)
applies function
to every row in x
I think you can get what you want by calling rowSums
, like so:
dt[, genesum := rowSums(dt[, -1])]
Is that what you're after?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With