I am fond of the parallel
package in R and how easy and intuitive it is to do parallel versions of apply
, sapply
, etc.
Is there a similar parallel function for replicate
?
You can create a parallel replication using the graphical user interface or the command line interfaces GGSCI and the Admin Client. Parallel Replicat is a new variant of Replicat that applies transactions in parallel to improve performance. It takes into account dependencies between transactions, similar to Integrated Replicat.
Integrated Replicat itself is very fast in applying the data to the target as it has parallelism concepts in it. But this Parallel Replicat process is even more faster than the Integrated Replicat process. Parallel Replicat has a highly scalable apply engine which achieves a apply rate up to 1 million+ operations per second.
Parallel Replication Architecture Parallel replication processes leverage the apply processing functionality that is available within the Oracle Database in integrated mode. Basic Parameters for Parallel Replicat The following table lists the basic parallel Replicat parameters and their description.
From the above table, you can see a new type of replicat process has been introduced from OGG 12.3, which is call “PARALLEL REPLICAT”. Integrated Replicat itself is very fast in applying the data to the target as it has parallelism concepts in it. But this Parallel Replicat process is even more faster than the Integrated Replicat process.
You can just use the parallel versions of lapply
or sapply
, instead of saying to replicate this expression n
times you do the apply on 1:n
and instead of giving an expression, you wrap that expression in a function that ignores the argument sent to it.
possibly something like:
#create cluster library(parallel) cl <- makeCluster(detectCores()-1) # get library support needed to run the code clusterEvalQ(cl,library(MASS)) # put objects in place that might be needed for the code myData <- data.frame(x=1:10, y=rnorm(10)) clusterExport(cl,c("myData")) # Set a different seed on each member of the cluster (just in case) clusterSetRNGStream(cl) #... then parallel replicate... parSapply(cl, 1:10000, function(i,...) { x <- rnorm(10); mean(x)/sd(x) } ) #stop the cluster stopCluster(cl)
as the parallel equivalent of:
replicate(10000, {x <- rnorm(10); mean(x)/sd(x) } )
Using clusterEvalQ
as a model, I think I would implement a parallel replicate
as:
parReplicate <- function(cl, n, expr, simplify=TRUE, USE.NAMES=TRUE) parSapply(cl, integer(n), function(i, ex) eval(ex, envir=.GlobalEnv), substitute(expr), simplify=simplify, USE.NAMES=USE.NAMES)
The arguments simplify
and USE.NAMES
are compatible with sapply
rather than replicate
, but they make it a better wrapper around parSapply
in my opinion.
Here's an example derived from the replicate
man page:
library(parallel) cl <- makePSOCKcluster(3) hist(parReplicate(cl, 100, mean(rexp(10))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With