I have the following problem and don't know how to code it in R:
A dataframe df with two columns: First column an ordered set of numbers df$ind, the second column a highly stochastic set of numbers df$ret.
I want to find two ranges in df$ret that don't overlap and optimize the condition that the sum of the numbers in the first range must be positive and the sum of the numbers of second range negative. After that I would like to have the 2x2 df$ind-values of the respective ranges.
Two possibilites come to my mind (and I don't know how to program either of these in R):
Could you give me some hints how to realize this in R or is there even a package for these kinds of optimizations (there seems to be a package for everything in R ;-)
UPDATE:
You will have 4 values: i, k, m, n: df$ret[i:k] and df$ret[m:n] with i < k < m < n.
The optimization is (pseudocode):
max:abs(sum(range(i:k)))+abs(sum(range(m:n)))
under the conditions:
sum(range(i:k)) > 0 and sum(range(m:n)) < 0
Here's a brute force method. For small data sets it should work fine; on my system I tested with size 100 and it was on the order of 0.5 seconds. For increased speed one should check the best max and min for overlap before checking all possible max/min pairs.
getbest <- function(x) {
# get the sums of all possible ranges
n <- length(x)
m <- as.data.frame(t(combn(n, 2)))
names(m) <- c("lo","hi")
m$sum <- sapply(1:nrow(m), function(i) {
sum(x[m$lo[i]:m$hi[i]])
})
# then get the ranges of positive and negative sums that don't overlap
neg <- m[m$sum<0,]
pos <- m[m$sum>0,]
use <- expand.grid(neg=1:nrow(neg), pos=1:nrow(pos))
use <- use[(neg$hi[use$neg] < pos$lo[use$pos]) |
(neg$lo[use$neg] > pos$hi[use$pos]),]
# finally get the absolute value for all ranges that don't overlap,
# and choose the largest
abs <- pos$sum[use$pos] - neg$sum[use$neg]
use <- use[which.max(abs),]
as.matrix(rbind(positive=pos[use$pos,], negative=neg[use$neg,]))
}
Use as follows; it returns the actual index of the range, so if the desired index df$ind is different than 1:n, just use this output to get the desired values.
x <- rnorm(100)
getbest(x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With