Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.

like image 483
extraeee Avatar asked Nov 12 '09 08:11

extraeee


People also ask

How do I delete a row and column of a matrix in Matlab?

The easiest way to remove a row or column from a matrix is to set that row or column equal to a pair of empty square brackets [] .


4 Answers

The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)

Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:

  • a[i][j] = 1 for i == j (the diagonal entries)
  • a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
  • a[i][j] = -n-1 if the edge (i,j) is not present in the graph.

Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:

  1. Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)

  2. Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).

All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)

like image 105
P Shved Avatar answered Sep 23 '22 11:09

P Shved


Well the brute force method goes something like this:

  • For n rows there are 2n subsets.
  • For n columns there are 2n subsets.
  • For an n x n matrix there are 22n subsets.

0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 22n-2+1 subsets but that's no different.

So you can work out each combination by brute force as an O(an) algorithm. Fast. :)

It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.

Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.

Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.

Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.

Here's an idea: construct 2n-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:

  1. The highest possible maximum sum (as per above); and
  2. Whether no numbers are positive allowing you to shortcut the answer.

if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.

This is generally referred to as a pruning technique.

What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.

I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.

like image 35
cletus Avatar answered Sep 21 '22 11:09

cletus


We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.

We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.

This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.

Here's an example implementation in Python:

def maximize_sum(m):
  frontier = [(m, 0, False)]
  best = None
  best_score = 0

  while frontier:
    current, startidx, cols_done = frontier.pop()
    score = matrix_sum(current)
    if score > best_score or not best:
      best = current
      best_score = score
    w, h = matrix_size(current)
    if not cols_done:
      for x in range(startidx, w):
        frontier.append((delete_column(current, x), x, False))
      startidx = 0
    for y in range(startidx, h):
      frontier.append((delete_row(current, y), y, True))
  return best_score, best

And here's the output on 280Z28's example matrix:

>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])
like image 21
Nick Johnson Avatar answered Sep 20 '22 11:09

Nick Johnson


Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.

like image 44
Erich Kitzmueller Avatar answered Sep 23 '22 11:09

Erich Kitzmueller