Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find whether a 2d matrix is subset of another 2d matrix

Recently i was taking part in one Hackathon and i came to know about a problem which tries to find a pattern of a grid form in a 2d matrix.A pattern could be U,H and T and will be represented by 3*3 matrix suppose if i want to present H and U

+--+--+--+            +--+--+--+
|1 |0 |1 |            |1 |0 |1 |
+--+--+--+            +--+--+--+
|1 |1 |1 |  --> H     |1 |0 |1 |    -> U
+--+--+--+            +--+--+--+
|1 |0 |1 |            |1 |1 |1 |
+--+--+--+            +--+--+--+

Now i need to search this into 10*10 matrix containing 0s and 1s.Closest and only solution i can get it brute force algorithm of O(n^4).In languages like MATLAB and R there are very subtle ways to do this but not in C,C++. I tried a lot to search this solution on Google and on SO.But closest i can get is this SO POST which discuss about implementing Rabin-Karp string-search algorithm .But there is no pseudocode or any post explaining this.Could anyone help or provide any link,pdf or some logic to simplify this?

EDIT

as Eugene Sh. commented that If N is the size of the large matrix(NxN) and k - the small one (kxk), the buteforce algorithm should take O((Nk)^2). Since k is fixed, it is reducing to O(N^2).Yes absolutely right. But is there is any generalised way if N and K is big?

like image 965
Ankur Avatar asked Jan 20 '15 18:01

Ankur


1 Answers

Alright, here is then the 2D Rabin-Karp approach.

For the following discussion, assume we want to find a (m, m) sub-matrix inside a (n, n) matrix. (The concept works for rectangular matrices just as well but I ran out of indices.)

The idea is that for each possible sub-matrix, we compute a hash. Only if that hash matches the hash of the matrix we want to find, we will compare element-wise.

To make this efficient, we must avoid re-computing the entire hash of the sub-matrix each time. Because I got little sleep tonight, the only hash function for which I could figure out how to do this easily is the sum of 1s in the respective sub-matrix. I leave it as an exercise to someone smarter than me to figure out a better rolling hash function.

Now, if we have just checked the sub-matrix from (i, j) to (i + m – 1, j + m – 1) and know it has x 1s inside, we can compute the number of 1s in the sub-matrix one to the right – that is, from (i, j + 1) to (i + m – 1, j + m) – by subtracting the number of 1s in the sub-vector from (i, j) to (i + m – 1, j) and adding the number of 1s in the sub-vector from (i, j + m) to (i + m – 1, j + m).

If we hit the right margin of the large matrix, we shift the window down by one and then back to the left margin and then again down by one and then again to the right and so forth.

Note that this requires O(m) operations, not O(m2) for each candidate. If we do this for every pair of indices, we get O(mn2) work. Thus, by cleverly shifting a window of the size of the potential sub-matrix through the large matrix, we can reduce the amount of work by a factor of m. That is, if we don't get too many hash collisions.

Here is a picture:

The rolling hash function for the window shifted to the right.

As we shift the current window one to the right, we subtract the number of 1s in the red column vector on the left side and add the number of 1s in the green column vector on the right side to obtain the number of 1s in the new window.

I have implemented a quick demo of this idea using the great Eigen C++ template library. The example also uses some stuff from Boost but only for argument parsing and output formatting so you can easily get rid of it if you don't have Boost but want to try out the code. The index fiddling is a bit messy but I'll leave it without further explanation here. The above prose should cover it sufficiently.

#include <cassert>
#include <cstddef>
#include <cstdlib>
#include <iostream>
#include <random>
#include <type_traits>
#include <utility>

#include <boost/format.hpp>
#include <boost/lexical_cast.hpp>

#include <Eigen/Dense>

#define PROGRAM "submatrix"
#define SEED_CSTDLIB_RAND 1

using BitMatrix = Eigen::Matrix<bool, Eigen::Dynamic, Eigen::Dynamic>;
using Index1D = BitMatrix::Index;
using Index2D = std::pair<Index1D, Index1D>;

std::ostream&
operator<<(std::ostream& out, const Index2D& idx)
{
  out << "(" << idx.first << ", " << idx.second << ")";
  return out;
}

BitMatrix
get_random_bit_matrix(const Index1D rows, const Index1D cols)
{
  auto matrix = BitMatrix {rows, cols};
  matrix.setRandom();
  return matrix;
}

Index2D
findSubMatrix(const BitMatrix& haystack,
              const BitMatrix& needle,
              Index1D *const collisions_ptr = nullptr) noexcept
{
  static_assert(std::is_signed<Index1D>::value, "unsigned index type");
  const auto end = Index2D {haystack.rows(), haystack.cols()};
  const auto hr = haystack.rows();
  const auto hc = haystack.cols();
  const auto nr = needle.rows();
  const auto nc = needle.cols();
  if (nr > hr || nr > hc)
    return end;
  const auto target = needle.count();
  auto current = haystack.block(0, 0, nr - 1, nc).count();
  auto j = Index1D {0};
  for (auto i = Index1D {0}; i <= hr - nr; ++i)
    {
      if (j == 0)  // at left margin
        current += haystack.block(i + nr - 1, 0, 1, nc).count();
      else if (j == hc - nc)  // at right margin
        current += haystack.block(i + nr - 1, hc - nc, 1, nc).count();
      else
        assert(!"this should never happen");
      while (true)
        {
          if (i % 2 == 0)  // moving right
            {
              if (j > 0)
                current += haystack.block(i, j + nc - 1, nr, 1).count();
            }
          else  // moving left
            {
              if (j < hc - nc)
                current += haystack.block(i, j, nr, 1).count();
            }
          assert(haystack.block(i, j, nr, nc).count() == current);
          if (current == target)
            {
              // TODO: There must be a better way than using cwiseEqual().
              if (haystack.block(i, j, nr, nc).cwiseEqual(needle).all())
                return Index2D {i, j};
              else if (collisions_ptr)
                *collisions_ptr += 1;
            }
          if (i % 2 == 0)  // moving right
            {
              if (j < hc - nc)
                {
                  current -= haystack.block(i, j, nr, 1).count();
                  ++j;
                }
              else break;
            }
          else  // moving left
            {
              if (j > 0)
                {
                  current -= haystack.block(i, j + nc - 1, nr, 1).count();
                  --j;
                }
              else break;
            }
        }
      if (i % 2 == 0)  // at right margin
        current -= haystack.block(i, hc - nc, 1, nc).count();
      else  // at left margin
        current -= haystack.block(i, 0, 1, nc).count();
    }
  return end;
}

int
main(int argc, char * * argv)
{
  if (SEED_CSTDLIB_RAND)
    {
      std::random_device rnddev {};
      srand(rnddev());
    }
  if (argc != 5)
    {
      std::cerr << "usage: " << PROGRAM
                << " ROWS_HAYSTACK COLUMNS_HAYSTACK"
                << " ROWS_NEEDLE COLUMNS_NEEDLE"
                << std::endl;
      return EXIT_FAILURE;
    }
  auto hr = boost::lexical_cast<Index1D>(argv[1]);
  auto hc = boost::lexical_cast<Index1D>(argv[2]);
  auto nr = boost::lexical_cast<Index1D>(argv[3]);
  auto nc = boost::lexical_cast<Index1D>(argv[4]);
  const auto haystack = get_random_bit_matrix(hr, hc);
  const auto needle = get_random_bit_matrix(nr, nc);
  auto collisions = Index1D {};
  const auto idx = findSubMatrix(haystack, needle, &collisions);
  const auto end = Index2D {haystack.rows(), haystack.cols()};
  std::cout << "This is the haystack:\n\n" << haystack << "\n\n";
  std::cout << "This is the needle:\n\n" << needle << "\n\n";
  if (idx != end)
    std::cout << "Found as sub-matrix at " << idx << ".\n";
  else
    std::cout << "Not found as sub-matrix.\n";
  std::cout << boost::format("There were %d (%.2f %%) hash collisions.\n")
    % collisions
    % (100.0 * collisions / ((hr - nr) * (hc - nc)));
  return (idx != end) ? EXIT_SUCCESS : EXIT_FAILURE;
}

While it compiles and runs, please consider the above as pseudo-code. I have made almost no attempt at optimizing it. It was just a proof-of concept for myself.

like image 175
5gon12eder Avatar answered Sep 28 '22 07:09

5gon12eder