Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast calculations of the Pareto front in R

So I am trying to calculate the Pareto front (http://en.wikipedia.org/wiki/Pareto_efficiency) in R and am able to do it, however, I am not able to do it efficiently. In particular as the number of pairs of points increases, the computations slow down considerably.

So in general, what I want to do is check for all non-dominated (or dominated) pairs. Now the way I have been doing this is to find all such pair of points such that xi > X and yi > Y where (xi, yi) are a single pair and X and Y represent all points x and y. Now, this part works very fast and is easy to implement, however, there is the additional possibility that multiple x values may be the same but they will have different y values so in that case I want to be able to identify the x value that has the lowest y value (and vise versa for points that have identical y values but different x values).

To illustrate this point here is a picture from Wikipedia:

enter image description here

so basically I want to be able to identify all points that lie on the red line.

Here is my code that does work but is very inefficient for large datasets:

#Example Data that actually runs quickly
x = runif(10000)
y = runif(10000)

pareto = 1:length(x)

for(i in 1:length(x)){
    cond1 = y[i]!=min(y[which(x==x[i])])
    cond2 = x[i]!=min(x[which(y==y[i])])
    for(n in 1:length(x)){
        if((x[i]>x[n]  &  y[i]>y[n]) | (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2)){
            pareto[i] = NA
            break
        }
    }
}
#All points not on the red line should be marks as NA in the pareto variable

The slow down definitely comes from calculating the points where (x[i]==x[n] & cond1) | (y[i]==y[n] & cond2) but I cannot find a way around it or a better Boolean expression to capture everything that I want. any suggestions greatly appreciated!

like image 678
user6291 Avatar asked Jan 22 '14 21:01

user6291


1 Answers

Following @BrodieG

system.time( {
    d = data.frame(x,y)
    D = d[order(d$x,d$y,decreasing=FALSE),]
    front = D[which(!duplicated(cummin(D$y))),]
} )

   user  system elapsed 
   0.02    0.00    0.02 

which is 0.86/0.02 = 43 times faster!

like image 99
user6291 Avatar answered Sep 22 '22 15:09

user6291