Suppose that I have some computable predicate, P, that maps integers to bools. I know that P 0
is true and I know some N such that P N
is false. I also know that P n = false
implies that P (n + 1) is false
[*]. I want to find the maximum n such that P n
is true.
Clearly, I can find a solution to this by bisection. Unfortunately, evaluating P is expensive (maybe it takes an hour or so). I also have a shiny server with many cores.
If evaluating P took constant time and I had 2 threads (say), I can see how I would do the search. I'd divide the interval [a, b] into three segments and evaluate P (2a + b)/3
and P (a + 2b)/3
. Once both evaluations had completed, I'd know which of the three segments to recurse into. Using two threads, my search time would be reduced by a third. Excellent!
However, what if evaluating P takes wildly varying times? In my concrete example, it might take between 10 seconds and an hour or so. Suppose again that I have 2 threads and have divided the interval as above. Maybe the first thread (evaluating P (2a+b)/3
) finishes first. How do I decide where to run the next one?
I guess there's some link to information theory or similar, because I'm trying to run the test that will give me the most information, given my current knowledge. But this seems like it should be a problem that others have investigated - can anyone point me at papers or similar?
[*] In the concrete case I care about, the test involves running an SMT solver trying to find a solution X to a constraint problem with one extra constraint of the form X ≥ n, where n is the integer above.
If you are looking for a paper reference, you might get more traction on CS.SE. Here I can only offer some heuristics.
Whenever a thread finishes, you can stop all other threads whose answer you now know (i.e., if you got P(n)=T
, you can stop all thread working on k<n
, and if P(n)=F
, you can stop all threads working on k>n
). Thus you now have 1 or more threads to start.
From the information-theoretic POV, dividing the existing intervals to minimize the maximum length of the new intervals is obviously optimal.
However, since you note in a comment:
the SMT solver takes much longer for satisfiable solutions
It might be faster to start with a large n
and go down slowly (e.g., if you know that P(100)=F
and P(1)=T
, test 95, 90, 80 in 3 threads, instead of 25, 50, 75 as recommended by the information theory).
You can also use the running time as an indicator of the probable result.
E.g., start your 3 threads at n=25,50,75
. Suppose in 1 minute you learn P(75)=F
but the other two are still running. Then you can puth the n=25
thread to sleep (to be awoken or killed in the future as necessary) and start two new threads for 60 and 70.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With