Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to count occurrences of a matrix inside a larger one

i'm facing a problem right now, i need to count the number of time a certain MxM matrix appears inside a NxN one (this one should be larger than the first one). Any hints on how to do this? I'm going to implement it in C and there is no option to change it.

Revision 1

Hi everyone, i would really like to say thank to all the answers and opinions on the matter. I should tell you that after many hours of hard work we have come to a solutions that is not strictly like the Boyer-Moore approach, but rather an algorithm on my own. I'm planning on publishing it once is tested and finished. The solutions is now being adapted to be paralelized for speed optimization using the university cluster with the C Library MPI.

like image 449
guiman Avatar asked Jun 06 '11 23:06

guiman


3 Answers

Hm, sounds like a 2-dimensional version of string matching. I wonder if there's a 2D version of Boyer-Moore?

A Boyer-Moore Approach for Two-Dimensional Matching

Ah, apparently there is. :-)

like image 165
Nemo Avatar answered Nov 07 '22 14:11

Nemo


One approach that immediately came to mind:

Treat the big matrix as a linear string of N*N "characters" and the small matrix as a linear regular expression of length (M+1)*M with "literal characters" in the first M positions of each "row" and the equivalent of .{N-M} in the remaining position of each row. Compile the regular expression to a DFA and run it.

Time to find all matches is O(N*N). I suspect there are fancier algorithms that would work (adaptations of fancier 1-d substring search algorithms) but this one's not too bad.

like image 36
R.. GitHub STOP HELPING ICE Avatar answered Nov 07 '22 13:11

R.. GitHub STOP HELPING ICE


Recently, fast algorithms have been developed for building 2D suffix trees. These can be used to find any square submatrix in a larger matrix in time independent of the size of the larger matrix:

  • http://www.springerlink.com/content/96511486841752h4/
  • http://www.springerlink.com/content/e65lt4r77773l29g/

You might need to be a subscriber to see those papers.

I also found this freely available one, which describes a randomised construction algorithm that is expected linear-time:

  • http://www.cs.nyu.edu/cs/faculty/cole/papers/CH00.ps

I expect that constructing a 2D suffix tree and searching with it would be faster if you need to search for many different submatrices in a single matrix, but it's probably slower than 2D Boyer-Moore if you'll be searching inside a different matrix each time.

like image 21
j_random_hacker Avatar answered Nov 07 '22 13:11

j_random_hacker