Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loop inside a foreach loop using doparallel

I have a function that contains a loop

myfun = function(z1.d, r, rs){
  x = z1.d[,r]
  or.d = order(as.vector(x), decreasing=TRUE)[rs]
  zz1.d = as.vector(x)
  r.l = zz1.d[or.d]

  y=vector()
  for (i in 1:9)
  {
    if(i<9) y[i]=mean( x[(x[,r] >= r.l[i] & x[,r] < r.l[i+1]),r] ) else{
      y[i] =  mean( z1.d[(x >= r.l[9]),r] )}
  }
  return(y)
}

rs is a numeric vector, z1.d is a zoo and y is also a numeric vector.

When I try to run the function inside a parallel loop:

cls = makePSOCKcluster(8)
registerDoParallel(cls)

rlarger.d.1  = foreach(r=1:dim(z1.d)[2], .combine = "cbind") %dopar% {    
  myfun(z1.d, r, rs)}

stopCluster(cls)

I get the following error:

Error in { : task 1 failed - "incorrect number of dimensions"

I don't know why, but I realized if I take the loop out of my function it does not give an error.

Also, if I run the exact same code with %do% instead of %dopar% (so not runing in parallel) it works fine (slow but without errors).

EDIT: as requested here is a sample of the parameters:

dim(z1.d)
[1] 8766  107
> z1.d[1:4,1:6]
                    AU_10092 AU_10622 AU_12038 AU_12046 AU_13017 AU_14015
1966-01-01 23:00:00       NA       NA       NA    1.816        0    4.573
1966-01-02 23:00:00       NA       NA       NA    9.614        0    4.064
1966-01-03 23:00:00        0       NA       NA    0.000        0    0.000
1966-01-04 23:00:00        0       NA       NA    0.000        0    0.000

> rs
[1] 300 250 200 150 100  75  50  30  10

r is defined in the foreach loop

like image 651
sbg Avatar asked May 19 '17 16:05

sbg


People also ask

Can we use for loop inside foreach loop?

We can also use a for loop inside another for loop. Here is a simple example of nested for loops.

How do you parallelize a nested loop?

Parallelizing nested loops. If we have nested for loops, it is often enough to simply parallelize the outermost loop: a(); #pragma omp parallel for for (int i = 0; i < 4; ++i) { for (int j = 0; j < 4; ++j) { c(i, j); } } z(); This is all that we need most of the time.

Can you nest two for loops inside another for loop?

A for loop can have more than one loop nested in it A for loop can have more than one loop nested in it.

Can you nest a for loop in a for loop?

A nested loop has one loop inside of another. These are typically used for working with two dimensions such as printing stars in rows and columns as shown below. When a loop is nested inside another loop, the inner loop runs many times inside the outer loop.


2 Answers

The error pops up because you failed to initiate zoo on your workers. Thus the workers don't know how to deal with zoo objects properly, instead they handle them as matrizes which don't behave the same way when subsetting! So the quick fix to your stated problem would be to add.packages="zoo" to your foreach call.

In my opinion you don't even need to do parallel computations. You can enhance your function dramatically if you use numeric vectors instead of zoo-objects:

# sample time series to match your object's size
set.seed(1234)
z.test <- as.zoo(replicate(107,sample(c(NA,runif(1000,0,10)),size = 8766, replace = TRUE)))

myfun_new <-  function(z, r, rs){
  x <-  as.numeric(z[,r])
  r.l <- x[order(x, decreasing=TRUE)[rs]]
  res_dim <- length(rs)
  y=numeric(res_dim)
  for (i in 1:res_dim){
    if(i< res_dim){ 
      y[i] <- mean( x[(x >= r.l[i] & x < r.l[i+1])], na.rm = TRUE ) 
    }else{
      y[i] <-   mean( x[(x >= r.l[res_dim])] , na.rm = TRUE)
    }
  }
  return(y)
}

Simple timings show the improvement:

system.time({
  cls = makePSOCKcluster(4)
  registerDoParallel(cls)
  rlarger.d.1 = foreach(r=1:dim(z.test)[2],.packages = "zoo", .combine = "cbind") %dopar% { 
    myfun(z.test, r, rs)}
  stopCluster(cls)
})
##  User      System verstrichen 
##  0.08        0.10       10.93
system.time({
  res <-sapply(1:dim(z.test)[2], function(r){myfun_new(z.test, r, rs)})
})
##  User      System verstrichen 
##  0.48        0.21        0.68

While the results are the same (only column names differ)

all.equal(res, rlarger.d.1, check.attributes = FALSE)
## [1] TRUE
like image 148
wici Avatar answered Oct 20 '22 14:10

wici


It sims like there is an error in your function code.

In line 2 you create a 1-dimensional object

x = z1.d[,r]

In line 9 you treat it like 2-dimensional one

x[some_logic, r]

That is why you have "incorrect number of dimensions" error. Although, I do not know why it works in %do% variant.

In any case you need to replace code inside for loop with:

if(i<9) y[i]=mean( x[(x[,r] >= r.l[i] & x[,r] < r.l[i+1])] ) else{
      y[i] =  mean( x[(x >= r.l[9])] )}

Or with:

if(i<9) y[i]=mean( z1.d[(x[,r] >= r.l[i] & x[,r] < r.l[i+1]),r] ) else{
      y[i] =  mean( z1.d[(x >= r.l[9]),r] )}

As you did not provide reproducible example, I did not test it.

like image 32
Istrel Avatar answered Oct 20 '22 15:10

Istrel