I'm looking for a fast implementation for the following, I'll call it Position2D
for lack of a better term:
Position2D[ matrix, sub_matrix ]
which finds the locations of sub_matrix
inside matrix
and returns the upper left and lower right row/column of a match.
For example, this:
Position2D[{
{0, 1, 2, 3},
{1, 2, 3, 4},
{2, 3, 4, 5},
{3, 4, 5, 6}
}, {
{2, 3},
{3, 4}
}]
should return this:
{
{{1, 3}, {2, 4}},
{{2, 2}, {3, 3}},
{{3, 1}, {4, 2}}
}
It should be fast enough to work quickly on 3000x2000
matrices with 100x100
sub-matrices. For simplicity, it is enough to only consider integer matrices.
The following code is based on an efficient custom position function to find positions of (possibly overlapping) integer sequences in a large integer list. The main idea is that we can first try to eficiently find the positions where the first row of the sub-matrix is in the Flatten
-ed large matrix, and then filter those, extracting full sub-matrices and comparing to the sub-matrix of interest. This will be efficient for most cases except very pathological ones (those, for which this procedure would generate a huge number of potential position candidates, while the true number of entries of the sub-matrix would be much smaller. But such cases seem rather unlikely generally, and then, further improvements to this simple scheme can be made).
For large matrices, the proposed solution will be about 15-25 times faster than the solution of @Szabolcs when a compiled version of sequence positions function is used, and 3-5 times faster for the top-level implementation of sequence positions - finding function. The actual speedup depends on matrix sizes, it is more for larger matrices. The code and benchmarks are below.
These helper functions are due to Norbert Pozar and taken from this Mathgroup thread. They are used to efficiently find starting positions of an integer sequence in a larger list (see the mentioned post for details).
Clear[seqPos];
fdz[v_] := Rest@DeleteDuplicates@Prepend[v, 0];
seqPos[list_List, seq_List] :=
Fold[
fdz[#1 (1 - Unitize[list[[#1]] - #2])] + 1 &,
fdz[Range[Length[list] - Length[seq] + 1] *
(1 - Unitize[list[[;; -Length[seq]]] - seq[[1]]])] + 1,
Rest@seq
] - Length[seq];
Example of use:
In[71]:= seqPos[{1,2,3,2,3,2,3,4},{2,3,2}]
Out[71]= {2,4}
However fast seqPos
might be, it is still the major bottleneck in my solution. Here is a compiled-to-C version of this, which gives another 5x performance boost to my code:
seqposC =
Compile[{{list, _Integer, 1}, {seq, _Integer, 1}},
Module[{i = 1, j = 1, res = Table[0, {Length[list]}], ctr = 0},
For[i = 1, i <= Length[list], i++,
If[list[[i]] == seq[[1]],
While[j < Length[seq] && i + j <= Length[list] &&
list[[i + j]] == seq[[j + 1]],
j++
];
If[j == Length[seq], res[[++ctr]] = i];
j = 1;
]
];
Take[res, ctr]
], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
Example of use:
In[72]:= seqposC[{1, 2, 3, 2, 3, 2, 3, 4}, {2, 3, 2}]
Out[72]= {2, 4}
The benchmarks below have been redone with this function (also the code for main function is slightly modified )
This is the main function. It finds positions of the first row in a matrix, and then filters them, extracting the sub-matrices at these positions and testing against the full sub-matrix of interest:
Clear[Position2D];
Position2D[m_, what_,seqposF_:Automatic] :=
Module[{posFlat, pos2D,sp = If[seqposF === Automatic,seqposC,seqposF]},
With[{dm = Dimensions[m], dwr = Reverse@Dimensions[what]},
posFlat = sp[Flatten@m, First@what];
pos2D =
Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]],2] &@
{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm];
Transpose[{#, Transpose[Transpose[#] + dwr - 1]}] &@
Select[pos2D,
m[[Last@# ;; Last@# + Last@dwr - 1,
First@# ;; First@# + First@dwr - 1]] == what &
]
]
];
For integer lists, the faster compiled subsequence position-finding function seqposC
can be used (this is a default). For generic lists, one can supply e.g. seqPos
, as a third argument.
We will use a simple example to dissect the code and explain its inner workings. This defines our test matrix and sub-matrix:
m = {{0, 1, 2, 3}, {1, 2, 3, 4}, {2, 3, 4, 5}};
what = {{2, 3}, {3, 4}};
This computes the dimensions of the above (it is more convenient to work with reversed dimensions for a sub-matrix):
In[78]:=
dm=Dimensions[m]
dwr=Reverse@Dimensions[what]
Out[78]= {3,4}
Out[79]= {2,2}
This finds a list of starting positions of the first row ({2,3}
here) in the Flatten
ed main matrix. These positions are at the same time "flat" candidate positions of the top left corner of the sub-matrix:
In[77]:= posFlat = seqPos[Flatten@m, First@what]
Out[77]= {3, 6, 9}
This will reconstruct the 2D "candidate" positions of the top left corner of a sub-matrix in a full matrix, using the dimensions of the main matrix:
In[83]:= posInterm = Transpose@{Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&@Last[dm]
Out[83]= {{3,1},{2,2},{1,3}}
We can then try using Select
to filter them out, extracting the full sub-matrix and comparing to what
, but we'll run into a problem here:
In[84]:=
Select[posInterm,
m[[Last@#;;Last@#+Last@dwr-1,First@#;;First@#+First@dwr-1]]==what&]
During evaluation of In[84]:= Part::take: Cannot take positions 3 through 4
in {{0,1,2,3},{1,2,3,4},{2,3,4,5}}. >>
Out[84]= {{3,1},{2,2}}
Apart from the error message, the result is correct. The error message itself is due to the fact that for the last position ({1,3}
) in the list, the bottom right corner of the sub-matrix will be outside the main matrix. We could of course use Quiet
to simply ignore the error messages, but that's a bad style. So, we will first filter those cases out, and this is what the line Pick[Transpose[#], Total[Clip[Reverse@dm - # - dwr + 2, {0, 1}]], 2] &@
is for. Specifically, consider
In[90]:=
Reverse@dm - # - dwr + 2 &@{Mod[posFlat, #, 1],IntegerPart[posFlat/#] + 1} &@Last[dm]
Out[90]= {{1,2,3},{2,1,0}}
The coordinates of the top left corners should stay within a difference of dimensions of matrix and a sub-matrix. The above sub-lists were made of x
and y
coordiantes of top - left corners. I added 2 to make all valid results strictly positive. We have to pick only coordiantes at those positions in Transpose@{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &@Last[dm]
( which is posInterm
), at which both sub-lists above have strictly positive numbers. I used Total[Clip[...,{0,1}]]
to recast it into picking only at those positions at which this second list has 2
(Clip
converts all positive integers to 1
, and Total
sums numbers in 2 sublists. The only way to get 2 is when numbers in both sublists are positive).
So, we have:
In[92]:=
pos2D=Pick[Transpose[#],Total[Clip[Reverse@dm-#-dwr+2,{0,1}]],2]&@
{Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&@Last[dm]
Out[92]= {{3,1},{2,2}}
After the list of 2D positions has been filtered, so that no structurally invalid positions are present, we can use Select
to extract the full sub-matrices and test against the sub-matrix of interest:
In[93]:=
finalPos =
Select[pos2D,m[[Last@#;;Last@#+Last@dwr-1,First@#;;First@#+First@dwr-1]]==what&]
Out[93]= {{3,1},{2,2}}
In this case, both positions are genuine. The final thing to do is to reconstruct the positions of the bottom - right corners of the submatrix and add them to the top-left corner positions. This is done by this line:
In[94]:= Transpose[{#,Transpose[Transpose[#]+dwr-1]}]&@finalPos
Out[94]= {{{3,1},{4,2}},{{2,2},{3,3}}}
I could have used Map
, but for a large list of positions, the above code would be more efficient.
In[216]:= Position2D[{{0,1,2,3},{1,2,3,4},{2,3,4,5},{3,4,5,6}},{{2,3},{3,4}}]
Out[216]= {{{3,1},{4,2}},{{2,2},{3,3}},{{1,3},{2,4}}}
Note that my index conventions are reversed w.r.t. @Szabolcs' solution.
Here is a power test:
nmat = 1000;
(* generate a large random matrix and a sub-matrix *)
largeTestMat = RandomInteger[100, {2000, 3000}];
what = RandomInteger[10, {100, 100}];
(* generate upper left random positions where to insert the submatrix *)
rposx = RandomInteger[{1,Last@Dimensions[largeTestMat] - Last@Dimensions[what] + 1}, nmat];
rposy = RandomInteger[{1,First@Dimensions[largeTestMat] - First@Dimensions[what] + 1},nmat];
(* insert the submatrix nmat times *)
With[{dwr = Reverse@Dimensions[what]},
Do[largeTestMat[[Last@p ;; Last@p + Last@dwr - 1,
First@p ;; First@p + First@dwr - 1]] = what,
{p,Transpose[{rposx, rposy}]}]]
Now, we test:
In[358]:= (ps1 = position2D[largeTestMat,what])//Short//Timing
Out[358]= {1.39,{{{1,2461},{100,2560}},<<151>>,{{1900,42},{1999,141}}}}
In[359]:= (ps2 = Position2D[largeTestMat,what])//Short//Timing
Out[359]= {0.062,{{{2461,1},{2560,100}},<<151>>,{{42,1900},{141,1999}}}}
(the actual number of sub-matrices is smaller than the number we try to generate, since many of them overlap and "destroy" the previously inserted ones - this is so because the sub-matrix size is a sizable fraction of the matrix size in our benchmark).
To compare, we should reverse the x-y indices in one of the solutions (level 3), and sort both lists, since positions may have been obtained in different order:
In[360]:= Sort@ps1===Sort[Reverse[ps2,{3}]]
Out[360]= True
I do not exclude a possibility that further optimizations are possible.
This is my implementation:
position2D[m_, k_] :=
Module[{di, dj, extractSubmatrix, pos},
{di, dj} = Dimensions[k] - 1;
extractSubmatrix[{i_, j_}] := m[[i ;; i + di, j ;; j + dj]];
pos = Position[ListCorrelate[k, m], ListCorrelate[k, k][[1, 1]]];
pos = Select[pos, extractSubmatrix[#] == k &];
{#, # + {di, dj}} & /@ pos
]
It uses ListCorrelate
to get a list of potential positions, then filters those that actually match. It's probably faster on packed real matrices.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With