I have to run a similar code across columns in a large matrix.
set.seed(1)
my_vector <- runif( 10000 )
my_sums <- NULL
for ( l in 1:length( my_vector ) ) {
current_result <- my_vector[ my_vector < runif( 1 ) ]
my_sums[l] <- sum( current_result )
}
head(my_sums)
# [1] 21.45613 2248.31463 2650.46104 62.82708 11.11391 86.21950
Sys.time
results:
user system elapsed
1.14 0.00 1.14
Any ideas on how to improve performance?
Matt Dowle's excellent data.table approach in base R
system.time({
set.seed(1)
my_vector <- runif(10000)
x <- runif(10000)
sorted <- sort(my_vector)
ind <- findInterval(x, sorted) + 1
my_sums <- c(0, cumsum(sorted))[ind]
})
# user system elapsed
# 0 0 0
head(my_sums)
#[1] 21.45613 2248.31463 2650.46104 62.82708 11.11391 86.21950
require(data.table)
system.time({
set.seed(1)
my_vector = runif(10000)
DT = data.table(my_vector)
setkey(DT, my_vector)
DT[,cumsum:=cumsum(my_vector)]
my_sums = DT[.(runif(10000)), cumsum, roll=TRUE]
my_sums[is.na(my_sums)] = 0
})
head(my_sums)
# [1] 21.45613 2248.31463 2650.46104 62.82708 11.11391 86.21950
# user system elapsed
# 0.004 0.000 0.004
What about sapply
?
temp <- sapply(seq_along(my_vector), function(l){
current_result <- my_vector[ my_vector < runif( 1 ) ]
my_sums[l] <- sum( current_result )
})
Gives this some performance improvements?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With