From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Tags:

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.

483

asked Nov 12 '09 08:11

extraeee

4 Answers

The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)

Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:

a[i][j] = 1 for i == j (the diagonal entries)
a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
a[i][j] = -n-1 if the edge (i,j) is not present in the graph.

Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:

Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)
Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).

All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)

105

answered Sep 23 '22 11:09

P Shved

Well the brute force method goes something like this:

For n rows there are 2ⁿ subsets.
For n columns there are 2ⁿ subsets.
For an n x n matrix there are 2²ⁿ subsets.

0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 2^2n-2+1 subsets but that's no different.

So you can work out each combination by brute force as an O(aⁿ) algorithm. Fast. :)

It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.

Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.

Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.

Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.

Here's an idea: construct 2ⁿ-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:

The highest possible maximum sum (as per above); and
Whether no numbers are positive allowing you to shortcut the answer.

if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.

This is generally referred to as a pruning technique.

What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.

I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.

answered Sep 21 '22 11:09

cletus

We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.

We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.

This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.

Here's an example implementation in Python:

def maximize_sum(m):
  frontier = [(m, 0, False)]
  best = None
  best_score = 0

  while frontier:
    current, startidx, cols_done = frontier.pop()
    score = matrix_sum(current)
    if score > best_score or not best:
      best = current
      best_score = score
    w, h = matrix_size(current)
    if not cols_done:
      for x in range(startidx, w):
        frontier.append((delete_column(current, x), x, False))
      startidx = 0
    for y in range(startidx, h):
      frontier.append((delete_row(current, y), y, True))
  return best_score, best

And here's the output on 280Z28's example matrix:

>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])

answered Sep 20 '22 11:09

Nick Johnson

Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.

answered Sep 23 '22 11:09

Erich Kitzmueller

Related questions
                            
                                Transposition table in Monte Carlo Tree Search algorithm unintended effect on UCT score
                            
                                execution policy in c++
                            
                                form the largest number possible in a list [duplicate]
                            
                                Robustly and accurately computing natural logarithm of quotient of two floating-point numbers
                            
                                How to get the returned value from a method from within the method?
                            
                                Allocate bikes to people - First Priority (Closest bike to closest person)
                            
                                How to get a termination reason from a recursive function?
                            
                                All combinations of set of dictionaries into K N-sized groups
                            
                                Graph theory: best algorithm to find combination of edges “directions”, where each node has at most one edge directed to it
                            
                                How to make a faster algorithm
                            
                                How to show formula for variables
                            
                                How to I make my AI algorithm play 9 board tic-tac-toe?
                            
                                How to get the element which are diagonal to a certain index in an array which represents a rectangle
                            
                                Finding minimum total length of line segments to connect 2N points
                            
                                Minimum cost path from (0,0) to (N,N) on 2D grid
                            
                                Fast intersection of HashSet<int> and List<int>
                            
                                Minimum number of flips to get adjacent 1's in a matrix
                            
                                How to find all partitions of a set
                            
                                Hash value for directed acyclic graph
                            
                                Chess Optimizations

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Tags:

algorithm

multidimensional-array

matrix