Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A fast implementation in Mathematica for Position2D

Tags:

I'm looking for a fast implementation for the following, I'll call it Position2D for lack of a better term:

Position2D[ matrix, sub_matrix ]

which finds the locations of sub_matrix inside matrix and returns the upper left and lower right row/column of a match.

For example, this:

Position2D[{
 {0, 1, 2, 3},
 {1, 2, 3, 4},
 {2, 3, 4, 5},
 {3, 4, 5, 6}
}, {
 {2, 3},
 {3, 4}
}]

should return this:

{
 {{1, 3}, {2, 4}},
 {{2, 2}, {3, 3}},
 {{3, 1}, {4, 2}} 
}

It should be fast enough to work quickly on 3000x2000 matrices with 100x100 sub-matrices. For simplicity, it is enough to only consider integer matrices.

like image 756
Arnoud Buzing Avatar asked Dec 03 '11 01:12

Arnoud Buzing


2 Answers

Algorithm

The following code is based on an efficient custom position function to find positions of (possibly overlapping) integer sequences in a large integer list. The main idea is that we can first try to eficiently find the positions where the first row of the sub-matrix is in the Flatten-ed large matrix, and then filter those, extracting full sub-matrices and comparing to the sub-matrix of interest. This will be efficient for most cases except very pathological ones (those, for which this procedure would generate a huge number of potential position candidates, while the true number of entries of the sub-matrix would be much smaller. But such cases seem rather unlikely generally, and then, further improvements to this simple scheme can be made).

For large matrices, the proposed solution will be about 15-25 times faster than the solution of @Szabolcs when a compiled version of sequence positions function is used, and 3-5 times faster for the top-level implementation of sequence positions - finding function. The actual speedup depends on matrix sizes, it is more for larger matrices. The code and benchmarks are below.

Code

A generally efficient function for finding positions of a sub-list (sequence)

These helper functions are due to Norbert Pozar and taken from this Mathgroup thread. They are used to efficiently find starting positions of an integer sequence in a larger list (see the mentioned post for details).

Clear[seqPos];
fdz[v_] := Rest@DeleteDuplicates@Prepend[v, 0];
seqPos[list_List, seq_List] :=
  Fold[
     fdz[#1 (1 - Unitize[list[[#1]] - #2])] + 1 &, 
     fdz[Range[Length[list] - Length[seq] + 1] *
       (1 - Unitize[list[[;; -Length[seq]]] - seq[[1]]])] + 1,
     Rest@seq
  ] - Length[seq];

Example of use:

In[71]:= seqPos[{1,2,3,2,3,2,3,4},{2,3,2}]
Out[71]= {2,4}

A faster position-finding function for integers

However fast seqPos might be, it is still the major bottleneck in my solution. Here is a compiled-to-C version of this, which gives another 5x performance boost to my code:

seqposC = 
  Compile[{{list, _Integer, 1}, {seq, _Integer, 1}},
    Module[{i = 1, j = 1, res = Table[0, {Length[list]}], ctr = 0},
       For[i = 1, i <= Length[list], i++,
          If[list[[i]] == seq[[1]],
             While[j < Length[seq] && i + j <= Length[list] && 
                     list[[i + j]] == seq[[j + 1]], 
                j++
             ];
             If[j == Length[seq], res[[++ctr]] = i];
             j = 1;
          ]
       ];
       Take[res, ctr]
    ], CompilationTarget -> "C", RuntimeOptions -> "Speed"]

Example of use:

In[72]:= seqposC[{1, 2, 3, 2, 3, 2, 3, 4}, {2, 3, 2}]
Out[72]= {2, 4}

The benchmarks below have been redone with this function (also the code for main function is slightly modified )

Main function

This is the main function. It finds positions of the first row in a matrix, and then filters them, extracting the sub-matrices at these positions and testing against the full sub-matrix of interest:

Clear[Position2D];
Position2D[m_, what_,seqposF_:Automatic] :=
  Module[{posFlat, pos2D,sp = If[seqposF === Automatic,seqposC,seqposF]},
     With[{dm = Dimensions[m], dwr = Reverse@Dimensions[what]},
         posFlat = sp[Flatten@m, First@what];
         pos2D = 
            Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]],2] &@
                {Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm];
         Transpose[{#, Transpose[Transpose[#] + dwr - 1]}] &@
            Select[pos2D,
                m[[Last@# ;; Last@# + Last@dwr - 1, 
                   First@# ;; First@# + First@dwr - 1]] == what &
            ]
     ]
  ];

For integer lists, the faster compiled subsequence position-finding function seqposC can be used (this is a default). For generic lists, one can supply e.g. seqPos, as a third argument.

How it works

We will use a simple example to dissect the code and explain its inner workings. This defines our test matrix and sub-matrix:

m  = {{0, 1, 2, 3}, {1, 2, 3, 4}, {2, 3, 4, 5}};
what = {{2, 3}, {3, 4}}; 

This computes the dimensions of the above (it is more convenient to work with reversed dimensions for a sub-matrix):

In[78]:= 
dm=Dimensions[m]
dwr=Reverse@Dimensions[what]

Out[78]= {3,4}
Out[79]= {2,2}

This finds a list of starting positions of the first row ({2,3} here) in the Flattened main matrix. These positions are at the same time "flat" candidate positions of the top left corner of the sub-matrix:

In[77]:= posFlat = seqPos[Flatten@m, First@what]
Out[77]= {3, 6, 9}

This will reconstruct the 2D "candidate" positions of the top left corner of a sub-matrix in a full matrix, using the dimensions of the main matrix:

In[83]:= posInterm = Transpose@{Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&@Last[dm]
Out[83]= {{3,1},{2,2},{1,3}}

We can then try using Select to filter them out, extracting the full sub-matrix and comparing to what, but we'll run into a problem here:

In[84]:= 
Select[posInterm,
   m[[Last@#;;Last@#+Last@dwr-1,First@#;;First@#+First@dwr-1]]==what&]

During evaluation of In[84]:= Part::take: Cannot take positions 3 through 4 
in {{0,1,2,3},{1,2,3,4},{2,3,4,5}}. >>

Out[84]= {{3,1},{2,2}}

Apart from the error message, the result is correct. The error message itself is due to the fact that for the last position ({1,3}) in the list, the bottom right corner of the sub-matrix will be outside the main matrix. We could of course use Quiet to simply ignore the error messages, but that's a bad style. So, we will first filter those cases out, and this is what the line Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]], 2] &@ is for. Specifically, consider

In[90]:= 
Reverse@dm - # - dwr + 2 &@{Mod[posFlat, #, 1],IntegerPart[posFlat/#] + 1} &@Last[dm]
Out[90]= {{1,2,3},{2,1,0}}

The coordinates of the top left corners should stay within a difference of dimensions of matrix and a sub-matrix. The above sub-lists were made of x and y coordiantes of top - left corners. I added 2 to make all valid results strictly positive. We have to pick only coordiantes at those positions in Transpose@{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm] ( which is posInterm), at which both sub-lists above have strictly positive numbers. I used Total[Clip[...,{0,1}]] to recast it into picking only at those positions at which this second list has 2 (Clip converts all positive integers to 1, and Total sums numbers in 2 sublists. The only way to get 2 is when numbers in both sublists are positive).

So, we have:

In[92]:= 
pos2D=Pick[Transpose[#],Total[Clip[Reverse@dm-#-dwr+2,{0,1}]],2]&@
           {Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&@Last[dm]
Out[92]= {{3,1},{2,2}} 

After the list of 2D positions has been filtered, so that no structurally invalid positions are present, we can use Select to extract the full sub-matrices and test against the sub-matrix of interest:

In[93]:= 
finalPos = 
  Select[pos2D,m[[Last@#;;Last@#+Last@dwr-1,First@#;;First@#+First@dwr-1]]==what&]
Out[93]= {{3,1},{2,2}}

In this case, both positions are genuine. The final thing to do is to reconstruct the positions of the bottom - right corners of the submatrix and add them to the top-left corner positions. This is done by this line:

In[94]:= Transpose[{#,Transpose[Transpose[#]+dwr-1]}]&@finalPos 
Out[94]= {{{3,1},{4,2}},{{2,2},{3,3}}}

I could have used Map, but for a large list of positions, the above code would be more efficient.

Example and benchmarks

The original example:

In[216]:= Position2D[{{0,1,2,3},{1,2,3,4},{2,3,4,5},{3,4,5,6}},{{2,3},{3,4}}]
Out[216]= {{{3,1},{4,2}},{{2,2},{3,3}},{{1,3},{2,4}}}

Note that my index conventions are reversed w.r.t. @Szabolcs' solution.

Benchmarks for large matrices and sub-matrices

Here is a power test:

nmat = 1000;
(* generate a large random matrix and a sub-matrix *)
largeTestMat = RandomInteger[100, {2000, 3000}];
what = RandomInteger[10, {100, 100}];
(* generate upper left random positions where to insert the submatrix *)    
rposx = RandomInteger[{1,Last@Dimensions[largeTestMat] - Last@Dimensions[what] + 1}, nmat];
rposy = RandomInteger[{1,First@Dimensions[largeTestMat] - First@Dimensions[what] + 1},nmat];
(* insert the submatrix nmat times *)
With[{dwr = Reverse@Dimensions[what]},
    Do[largeTestMat[[Last@p ;; Last@p + Last@dwr - 1, 
                     First@p ;; First@p + First@dwr - 1]] = what, 
       {p,Transpose[{rposx, rposy}]}]]

Now, we test:

In[358]:= (ps1 = position2D[largeTestMat,what])//Short//Timing
Out[358]= {1.39,{{{1,2461},{100,2560}},<<151>>,{{1900,42},{1999,141}}}}

In[359]:= (ps2 = Position2D[largeTestMat,what])//Short//Timing
Out[359]= {0.062,{{{2461,1},{2560,100}},<<151>>,{{42,1900},{141,1999}}}}

(the actual number of sub-matrices is smaller than the number we try to generate, since many of them overlap and "destroy" the previously inserted ones - this is so because the sub-matrix size is a sizable fraction of the matrix size in our benchmark).

To compare, we should reverse the x-y indices in one of the solutions (level 3), and sort both lists, since positions may have been obtained in different order:

In[360]:= Sort@ps1===Sort[Reverse[ps2,{3}]]
Out[360]= True

I do not exclude a possibility that further optimizations are possible.

like image 110
Leonid Shifrin Avatar answered Sep 21 '22 21:09

Leonid Shifrin


This is my implementation:

position2D[m_, k_] :=
 Module[{di, dj, extractSubmatrix, pos},
  {di, dj} = Dimensions[k] - 1;
  extractSubmatrix[{i_, j_}] := m[[i ;; i + di, j ;; j + dj]];
  pos = Position[ListCorrelate[k, m], ListCorrelate[k, k][[1, 1]]];
  pos = Select[pos, extractSubmatrix[#] == k &];
  {#, # + {di, dj}} & /@ pos
 ]

It uses ListCorrelate to get a list of potential positions, then filters those that actually match. It's probably faster on packed real matrices.

like image 34
Szabolcs Avatar answered Sep 18 '22 21:09

Szabolcs