This question was inspired by an answer I was working on yesterday.
Let's say we have N inputs that evaluate to either true or false, what is the most efficient way to determine if X of those inputs are true?
Caveats:
- The inputs are not in an array, so if you convert them to an array please account for any overhead costs.
- By "most efficient" I mean in terms of best average case (although I would love to see best and worst case stats too).
Here are two of the methods I came across yesterday.
1) Think of the variables as Boolean inputs for a circuit and reduce them using a K-map
At first I thought this would be the most efficient means because it follows circuit logic, but I've definitely had second thoughts. As the number of inputs increases, the number of comparisons goes up exponentially
2 inputs:
1 of 2: if(1 OR 2)
2 of 2: if(1 AND 2)
3 inputs:
1 of 3: if(1 OR 2 OR 3)
2 of 3: if((1 AND 2) OR (1 AND 3) OR (2 AND 3))
3 of 3: if(1 AND 2 AND 3)
4 inputs:
1 of 4: if(1 OR 2 OR 3 OR 4)
2 of 4: if((1 AND 2) OR (1 AND 3) OR (1 AND 4) OR (2 AND 3) OR (2 AND 4) OR (3 AND 4))
3 of 4: if((1 AND 2 AND 3) OR (1 AND 2 AND 4) OR (1 AND 3 AND 4) OR (2 AND 3 AND 4))
4 of 4: if(1 AND 2 AND 3 AND 4)
... etc. ...
The best case scenario is fine (O(1)
), but the worst case is far worse than...
2) A counter and sequential if statements
This performs in O(n)
time always, which is OK, but I was hoping for a better best case.
counter = 0
if(input 1)
counter++
if(input 2)
counter++
if(input 3)
counter++
... etc. ...
if(counter >= X)
// true
What is a more efficient solution than either of these?
On the problem's complexity
Since an exact count is requested (as opposed to say asking if at least x inputs are on), the problem is very clearly O(n)
:
We can certainly implement sub-optimal algorithms which, for example, would [unnecessarily] visit all other inputs as each input is being processed, making it an O(n^2) implementation, but that is of course silly.
This being asserted, the questions is therefore probably about...
tricks which would make the implementation faster
It should be noted that while such tricks may exist, the complexity of the algorithm/problem remains stubbornly at O(n).
Trick 1: Better storage for inputs
Unfortunately, the question indicates that the inputs come from named variables and that the cost for any conversion of the input [for the sake of allowing faster counting] would have to be taken into account for the evaluation of the overall performance of the algorithm. Although this eventually depends on the underlying language, runtime etc., the need to account for the conversion cost very likely condemns any algorithm based on alternate storage to be slower than solutions which keep the inputs as-is.
Trick 2: short-circuit the evaluation
The idea is to return false
as soon (or shortly after) as either
This trick is relatively straight forward, but the extra cost for computing the values needed in the early exit tests may offset the gains made by [statically] exiting early.
Trick 3: use reverse logic: count the number of inputs that are off rather than these are are on. (or count both).
The cost/benefits of this approach depends on both the number of on input to test for (the X of the question) and on the statistical prior we may have about the inputs (is the number on on-inputs at a given time relatively uniformly distibuted or do we tend to have only a few inputs on (or off)).
The solution proposed by Chris Acheson provides a baseline for the use of both Trick 2 and Trick 3. Assuming that we could make a few assumptions about the distribution of the inputs' state, additional performance improvements to this baseline would be driven such "priors": some quick heuristics done prior to the counting of the inputs per se would determine whether which state we should count (on or off or both), which limit we should test for etc. and branch to the corresponding "versions" of the algorithm.
Additional gains are also possible, if we know the individual probability of a given input to be on or off, as we'd then test for the most (or least) likely ones first, to quickly get to our "short circuit value".
On the best-case/worse-case "complexity" of these optimized algorithms
Assuming that
A combination of Tricks #2 and #3 could be O(X/2)
on average (I need to do the math, but that seems right). However I think it wiser to talk in terms of number of operations
relative to X and/or n, rather than misusing the O-notation...
Assuming that all operations roughly incur the same cost
It is easier and also more exact to compute the total number of operations needed for a given algorithm to complete, and hence to use such counts, for various best/worse/average cases to help decide on specific algorithms.
To illustrate this, a naive implementation that merely would systematically count all on-inputs and only compare the counter at the end, would be of O(n) complexity and complete in all cases in roughly 1 + 2*n + 1 operations. Such an algorithm may prove to be overall ,better than a fancier algorithm which while being, say, O(X), O((X+n)/2) and O(n) in the best, average and worse cases, respectively, may well use X*3, (X+n)* 1.5, and n*3 operations in these same cases.
This version will be efficient for values of X close to either zero or N:
true_counter = 0
false_counter = 0
max_false = N - X
if(input 1)
true_counter++
if(counter >= X)
return true
else
false_counter++
if(false_counter > max_false)
return false
// and so on
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With