Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-threaded bisection search

Suppose that I have some computable predicate, P, that maps integers to bools. I know that P 0 is true and I know some N such that P N is false. I also know that P n = false implies that P (n + 1) is false [*]. I want to find the maximum n such that P n is true.

Clearly, I can find a solution to this by bisection. Unfortunately, evaluating P is expensive (maybe it takes an hour or so). I also have a shiny server with many cores.

If evaluating P took constant time and I had 2 threads (say), I can see how I would do the search. I'd divide the interval [a, b] into three segments and evaluate P (2a + b)/3 and P (a + 2b)/3. Once both evaluations had completed, I'd know which of the three segments to recurse into. Using two threads, my search time would be reduced by a third. Excellent!

However, what if evaluating P takes wildly varying times? In my concrete example, it might take between 10 seconds and an hour or so. Suppose again that I have 2 threads and have divided the interval as above. Maybe the first thread (evaluating P (2a+b)/3) finishes first. How do I decide where to run the next one?

I guess there's some link to information theory or similar, because I'm trying to run the test that will give me the most information, given my current knowledge. But this seems like it should be a problem that others have investigated - can anyone point me at papers or similar?

[*] In the concrete case I care about, the test involves running an SMT solver trying to find a solution X to a constraint problem with one extra constraint of the form X ≥ n, where n is the integer above.

like image 982
Rupert Swarbrick Avatar asked Dec 21 '18 11:12

Rupert Swarbrick


1 Answers

If you are looking for a paper reference, you might get more traction on CS.SE. Here I can only offer some heuristics.

Whenever a thread finishes, you can stop all other threads whose answer you now know (i.e., if you got P(n)=T, you can stop all thread working on k<n, and if P(n)=F, you can stop all threads working on k>n). Thus you now have 1 or more threads to start.

From the information-theoretic POV, dividing the existing intervals to minimize the maximum length of the new intervals is obviously optimal.

However, since you note in a comment:

the SMT solver takes much longer for satisfiable solutions

It might be faster to start with a large n and go down slowly (e.g., if you know that P(100)=F and P(1)=T, test 95, 90, 80 in 3 threads, instead of 25, 50, 75 as recommended by the information theory).

You can also use the running time as an indicator of the probable result. E.g., start your 3 threads at n=25,50,75. Suppose in 1 minute you learn P(75)=F but the other two are still running. Then you can puth the n=25 thread to sleep (to be awoken or killed in the future as necessary) and start two new threads for 60 and 70.

like image 144
sds Avatar answered Nov 15 '22 04:11

sds