Given a matrix of n
rows and m
columns of 1
's and 0
's, it is required to find out the number of pairs of rows that can be selected so that their OR
is 11111....m times
.
Example:
1 0 1 0 1
0 1 0 0 1
1 1 1 1 0
Answer:
2 ---> OR of row number [1,3] and [2,3]
Given n
and m
can be an order upto <= 3000
, how efficiently can this problem be solved?
PS: I already tried with a naive O(n*n*m)
method. I was thinking of a better solution.
1. trivial solution
The trivial algorithm (which you already discovered but did not post) is to take all (n
choose 2) combinations of the n
rows, OR them, and see if it works. This is O(n^2 * m)
. Coding would look like:
for (i = 0; i < n; ++i)
for (j=i+1; j < n; ++ j) {
try OR of row[i] with row[j] to see if it works, if so record (i,j)
}
2. constant speedup You can improve the running time by a factor of the word size by packing bits into the words. This still gives same asymptotic, but in practice a factor of 64-bit speedup on a 64-bit machine. This has already been noted in comments above.
3. heuristic speedup
We can do heuristics to further improve the time in practice, but no asymptotic guarantee. Consider sorting your rows by hamming weight, with smallest hamming weight in the front and largest hamming weight at the end (running time O(m * n * log m )
). Then you only need to compare low weight rows with high weight rows: specifically, the weights need to be >= m
. Then the search would look something like this:
for (i = 0; i < n; ++i)
for (j=n-1; j > i; --j) /* go backwards to take advantage of hmwt */
{
if ( (hmwt(row[i]) + hmwt(row[j])) < m)
break;
try OR of row[i] with row[j] to see if it works, if so record (i,j)
}
4. towards a better approach Another approach that may offer better returns is to choose a column of low hamming weight. Then combine the rows into two groups: those with a 1 in that column (group A) versus those with a 0 in that column (group B). Then you only need to consider combinations of rows where one come from group A and the other comes from group B, or both come from group A (Thanks @ruakh for catching my oversight). Something along these lines should help a lot. But again this is still asymptotically same worse case, however should be faster in practice (assuming that we're not in the case of all combinations being the answer).
5. the limits of what can be done
It is easy to construct examples where the number of vector pairs that work is O(n^2)
, and therefore it feels very hard to beat O(m*n^2
) worse case. What we should be seeking is a solution that is somehow related to the number of pairs that work. The heuristics above are going this direction. If there is a column with small hamming weight h
, then point 4 above brings the running time down to O(h*n*m + h^2*m)
. If h
is significantly smaller than n
, then you get big improvements.
Here's an off-the-wall idea that might have even worse asymptotic (or even average) behavior -- but it generalizes in an interesting way, and at least offers a different approach. This problem can be viewed as an exact cover problem. Each of the n rows contains a set of values S from the set {1,2,...,m}, corresponding to the column indices for which the row has the value 1. The task of the problem is to find a collection of rows whose sets form a disjoint partition of {1,2,...m}. When there are only two such rows in an exact cover, these rows are binary opposites of the kind that you are looking for. However, there are more complicated exact covers possible, such as one involving three rows:
0 1 0 0 1
1 0 0 0 0
0 0 1 1 0
The exact cover problem looks for all such exact covers, and is an NP-complete problem. The canonical solution is Algorithm X, created by Donald Knuth.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With