Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing ranges in R

Tags:

optimization

r

I have the following problem and don't know how to code it in R:

A dataframe df with two columns: First column an ordered set of numbers df$ind, the second column a highly stochastic set of numbers df$ret.

I want to find two ranges in df$ret that don't overlap and optimize the condition that the sum of the numbers in the first range must be positive and the sum of the numbers of second range negative. After that I would like to have the 2x2 df$ind-values of the respective ranges.

Two possibilites come to my mind (and I don't know how to program either of these in R):

  1. Monte Carlo where 2x2 positions are being chosen, the sums being calculated and compared to the hitherto best solution.
  2. Trying all possible ranges and taking the best solution (concerning the number of values this seems computationally doable).

Could you give me some hints how to realize this in R or is there even a package for these kinds of optimizations (there seems to be a package for everything in R ;-)

UPDATE:
You will have 4 values: i, k, m, n: df$ret[i:k] and df$ret[m:n] with i < k < m < n.

The optimization is (pseudocode):

max:abs(sum(range(i:k)))+abs(sum(range(m:n)))


under the conditions:

sum(range(i:k)) > 0 and sum(range(m:n)) < 0

like image 650
vonjd Avatar asked Feb 18 '26 05:02

vonjd


1 Answers

Here's a brute force method. For small data sets it should work fine; on my system I tested with size 100 and it was on the order of 0.5 seconds. For increased speed one should check the best max and min for overlap before checking all possible max/min pairs.

getbest <- function(x) {
  # get the sums of all possible ranges
  n <- length(x)
  m <- as.data.frame(t(combn(n, 2)))
  names(m) <- c("lo","hi")
  m$sum <- sapply(1:nrow(m), function(i) {
    sum(x[m$lo[i]:m$hi[i]])
  })
  # then get the ranges of positive and negative sums that don't overlap
  neg <- m[m$sum<0,]
  pos <- m[m$sum>0,]
  use <- expand.grid(neg=1:nrow(neg), pos=1:nrow(pos))
  use <- use[(neg$hi[use$neg] < pos$lo[use$pos]) | 
                  (neg$lo[use$neg] > pos$hi[use$pos]),]
  # finally get the absolute value for all ranges that don't overlap,
  # and choose the largest
  abs <- pos$sum[use$pos] - neg$sum[use$neg]
  use <- use[which.max(abs),]
  as.matrix(rbind(positive=pos[use$pos,], negative=neg[use$neg,]))
}

Use as follows; it returns the actual index of the range, so if the desired index df$ind is different than 1:n, just use this output to get the desired values.

x <- rnorm(100)
getbest(x)
like image 78
Aaron left Stack Overflow Avatar answered Feb 20 '26 23:02

Aaron left Stack Overflow