I have a data frame that is some 35,000 rows, by 7 columns. it looks like this:
head(nuc)
chr feature start end gene_id pctAT pctGC length
1 1 CDS 67000042 67000051 NM_032291 0.600000 0.400000 10
2 1 CDS 67091530 67091593 NM_032291 0.609375 0.390625 64
3 1 CDS 67098753 67098777 NM_032291 0.600000 0.400000 25
4 1 CDS 67101627 67101698 NM_032291 0.472222 0.527778 72
5 1 CDS 67105460 67105516 NM_032291 0.631579 0.368421 57
6 1 CDS 67108493 67108547 NM_032291 0.436364 0.563636 55
gene_id is a factor, that has about 3,500 unique levels. I want to, for each level of gene_id get the min(start)
, max(end)
, mean(pctAT)
, mean(pctGC)
, and sum(length)
.
I tried using lapply and do.call for this, but it's taking forever +30 minutes to run. the code I'm using is:
nuc_prof = lapply(levels(nuc$gene_id), function(gene){
t = nuc[nuc$gene_id==gene, ]
return(list(gene_id=gene, start=min(t$start), end=max(t$end), pctGC =
mean(t$pctGC), pct = mean(t$pctAT), cdslength = sum(t$length)))
})
nuc_prof = do.call(rbind, nuc_prof)
I'm certain I'm doing something wrong to slow this down. I haven't waited for it to finish as I'm sure it can be faster. Any ideas?
Difference between lapply() and sapply() functions: lapply() and sapply() functions are used to perform some operations in a list of objects. sapply() function in R is more efficient than lapply() in the output returned because sapply() stores values directly into a vector.
The sapply() was faster than the for() loop, but how much faster depends on the values of n . For n = 100 the sapply() is 15 times slower than the vectorized version, and the for() is 23 times slower than the sapply() !
Since I'm in an evangelizing mood ... here's what the fast data.table
solution would look like:
library(data.table)
dt <- data.table(nuc, key="gene_id")
dt[,list(A=min(start),
B=max(end),
C=mean(pctAT),
D=mean(pctGC),
E=sum(length)), by=key(dt)]
# gene_id A B C D E
# 1: NM_032291 67000042 67108547 0.5582567 0.4417433 283
# 2: ZZZ 67000042 67108547 0.5582567 0.4417433 283
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With